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Leon  N  COOPER 

Department  of  Physics  anil  Center  for  Neural  Science.  Brown  University.  Providence. 
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Introduction 

Although  we  now  rank  Maxwell  among  the  greatest  of  19th  century 
physicists,  he  wrote  that  he  was  adding  little  to  the  work  Faraday  had 
already  done. 

"I  have  endeavored  to  make  it  plain  that  I  am  not  attempting  to 
establish  any  physical  theory  of  a  science  in  which  I  have  not  made  a 
single  experiment  worthy  of  the  name,  and  that  the  limit  of  my  design  is 
to  show  how  by  a  strict  application  of  the  ideas  and  methods  of  Faraday 
to  the  motion  of  an  imaginary  fluid,  everything  relating  to  that  motion 
maybe  distinctly  represenred,  and  thence  to  deduce  the  theory  of 
attractions  of  electric  and  magnetic  bodies,  and  of  the  conduction  of 
electric  currents."  ( Maxwell .  IS.Sfi) 


Modesty  perhaps,  but  not  entirely  unwarranted:  for  in  spite  of  his 
enormous  talents,  the  import  of  his  inventions  become  apparent  in  the 
light  of  later  developments  with  a  claritv  that^for  all  of  his  genius,  could 
have  not  have  been  visible  to  him. 

Maxwell's  historic  achievement  was  to  write  down  the  equations  of 
electricity  and  magnetism  in  such  a  way  as  to  incorporate  the  experimen¬ 
tal  discoveries  of  Coulomb.  Ampere  and  Faraday  and  to  realize  that  these 
equations  were  inconsistent.  To  make  them  consistent  he  was  forced  to 
profoundly  alter  their  character,  giving  rise  to  a  new  class  of  solutions: 


‘The  work  on  which  Ihis  article  is  bawd  was  supported  in  pari  bv  Ihc  1 1  S.  Olticc  of  Naval 
Research,  under  contract  #NIIUI|4.KI^-II|3(> 
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propagating  waves  whose  speed  (as  he  calculated  using  experimental  data 
on  electric  and  magnetic  susceptibilities)  corresponded  very  closely  to  the 
speed  of  light. 

"The  velocity  of  transverse  undulations  in  our  hypothetical  medium, 
calculated  from  the  electromagnetic  experiments  of  M.M.  Kohlrausch 
and  Weber,  agrees  so  exactly  with  the  velocity  of  light  calculated  from 
the  optical  experiments  of  M.  Fijyuu.  that  vp  can  scarcely  avoid  the 
inference  that  light  consists  in  the  transverse  undulations  of  the  same 
medium  which  is  the  cause  of  electric  and  magnetic  phenomena." 

( Maxwell .  1862.) 

And  in  a  letter  to  William  Thomson  (Lord  Kelvin): 

"I  made  out  the  equations  in  the  country  before  I  had  any  suspicion  of 
the  nearness  between  the  two  values  of  the  velocity  of  propagation  of 
magnetic  effects  and  that  of  light,  so  that  1  think  i  have  reason  to  believe 
that  the  magnetic  and  luminiferous  media  are  identical."  (Maxwell. 

1861.) 

He  thus  produced  a  unified  Held  theory  of  electricity,  magnetism  and 
light — the  first  of  its  kind.  But  even  this  monumental  result  was  just  the 
beginning.  For  he  opened  the  path  to  the  twentieth  century:  the  Michcl- 
son-MorJ^  experiment,  relativity,  the  primacy  of  field  theory  and  sym¬ 
metry  considerations.  Lorentz  and,  most  recently,  gauge  invariance  as 
general  symmetries  underlying  all  physical  theories. 

This  emerges  in  retrospect.  And  Maxwell  would  no  doubt  be  enor¬ 
mously  pleased  by  the  great  success  of  the  enterprise  he  began.  But  he 
might  remind  us  that  his  new  inventions  were  preceded  by  a  long 
exploration  of  known  territory.  For  most  of  his  working  lifetime  he 
applied  his  physical  and  mathematical  intuition  to  write  down  a  set  of 
equations  that  would  summarize  what  was  already  known. 
When  this  could  be  clearly  stated,  existing  contra¬ 
dictions  became  apparent-and  the  new  assumptions  to 
remove  them  relatively  quickly  made. 


Today.  I  would  like  to  discuss  some  work  that  my  colleagues  and  I  have 
been  doing  recently  on  the  organization  of  the  brain.  Although  this  is 
somewhat  removed  from  what  physicists  usually  think  about,  there  is  a 
habit  of  analysis  that,  I  believe,  a  physicist  can  profitably  bring  to  complex 
problems  in  biology,  and  perhaps  in  other  areas.  Also  it  is  not  impossible 
that  a  precise  understanding  of  such  a  complex  system  could  produce 
surprises — not  a  new  fundamental  force  or  field,  but  rather  a  new 
understanding  of  the  behavior  of  large  interacting  systems  that  could 
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illuminate  the  systems  of  equations  that  concern  us  in  other  domains  (as 
has  happened  previously  in  the  past  generation  with  the  problems  of 
superconductivity,  superfluidity  and  phase  transitions). 

First,  let  me  attempt  a  very  quick  description  of  the  elements  of  this 
problem.  In  Fig.  I,  we  see  a  view  of  the  human  brain.  It  is  an  incredibly 
complex  piece  of  machinery  involving  many  individual  elements — the 
most  relevant  of  which  are  known  as  neurons  or  nerve  cells.  It  is  believed 
that  information  processing,  memory  storage,  logical  thinking,  etc.,  occurs 
among  the  neurons.  Neocortex  (new  cortex) — generally  thought  to  be  the 
thinking  part  of  the  brain — is  on  the  surface:  this  sheet  of  neurons  if 
spread  out  is  rather  large — perhaps  several  square  meters.  To  fit  it  into  a 
reasonably  sized  skull,  it  had  to  be  folded;  typical  folds  on  the  surface  of 
the  cortex  are  seen  in  the  Figure.  In  Fig.  2  a  portion  of  the  neural 
network  (in  visual  cortex)  is  shown.  We  see  here  suggested  some  of  the 
complexity  of  the  cellular  circuitry. 

The  new  part  of  the  brain,  cortex,  the  special  biological  gift  of  higher 
mammals  evolved  very  rapidly,  in  only  a  few  million  years.  In  contrast, 
other  parts,such  as  the  brainstem  that  we  share  with  reptiles  and  that  are 


Fig.  I .  Side  view  of  the  human  brain,  f  rom 
DeArmend,  Fusco  and  Dewey  Structure 
of  the  Brain.  ^ 
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A.  couch?  plciiformr,  —  n.  couch*  de»  pciilcs  cellules  pyramidal**  -  C,  cocntnenccmcnl 
Je  la  cour he  des  cellules  pyramtdales  grandes  el  mo>ennes. 


Fig.  2.  A  portion  of  the  neural  network  in  visual  cortex ^ 

from  R.  Cajal,  Histologie  du  Systetoe 
Nerveux. 


mostly  hard-wired  and  perform  a  great  variety  of  control  functions  took 
hundreds  of  millions  of  years  to  develop.  This  is  a  very  suggestive  fact. 

The  neurons  are  a  marvellous  piece  of  machinery.  Like  most  cells,  they 
share  basic  structures  to  keep  themselves  alive,  but  have  become 
extremely  specialized.  Their  primary  function  is  to  transmit  (and  probably 
also  to  store)  information.  The  fundamental  device  utilized  by  these  cells 
is  an  excitable  membrffne.  The  cell  is  capable  of  altering  normal  ion 
concentrations  in  its  interior.  The  proportions  of  ions  such  as  Na*.  Ca**, 

K+  and  so  on  in  squid  blood  are  almost  those  in  sea  water 
which  by  the  way,  suggests  strongly  where  the  blood  comes 
from  (Table  1). 

Inside  a  neuron  there  is  an  excess  of  K*  and  too  little  Na‘.  This  is  due 
to  a  metabolic  pump  which  slowly  pumps  sodium  out  and  potassium  in. 

(Fig.  3)  The  pump  can  be  thought  of.  for  practical  purposes,  as  slowly 
charging  a  battery.  A  channel  is  left  open  for  K'  ions.  Due  to  the 
concentration  difference,  the  K’  ions  try  to  get  outside.  But  since  ihcv 
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Table  I 

Concentrations  of  ions  inside  and  outside  ireslilx 
isolated  axons  of  squid 


Urn 

Concentration  (mM) 

Axoplasm 

Blood 

Seawater 

Potassium 

400 

211 

in 

Sodium 

50 

440 

4htl 

Chloride 

40-150 

5MI 

sail 

Calcium 

II..T  v  111  ’’ 

10 

in 

‘The  precise  value  of  ionized  intracellular  calcium  is 
not  known.  Data  from  Hodgkin  (1964)  and  Baker. 
Hodgkin.  and  Ridgway  (1971)^  from 

Kuffler  and  Nicholls,  from  Neuron 
to  Brain. 


carry  positive  charge,  they  build  up  an  electric 
potential  difference  across  the  membrane  as 
they  move  along  the  concentration  gradient.  This 
potential  difference  balances  the  concentration  difference  when  there  is  a 
potential  difference  of  about  70  millivolts  across  the  membrane. 

Electrical  or  chemical  disturbances  on  sortie  region  of  the  membrane 
open  the  Na’  channel;  there  is  a  rush  of  ions  locally  in  that  region;  the 
negative  resting  potential,  of  about  —70  millivolts,  may  jump  to  a  positive 
potential  (about  +55  millivolts);  this  is  known  as  the  action  potential. 
Eventually  equilibrium  is  re-established  in  the  original  region.  But  the 
potential  disturbance  has  spread  a  hit;  this  opens  the  Na’  channel  a  bit 
down  the  line,  then  one  has  another  action  potential.  So  the  action 
potential  moves  down  the  membrane.  Depending  on  how  much 
depolarization,  in  a  way  that  I  do  not  have  time  to  go  into,  one  can 
convert  a  signal  of  a  given  intensity  into  a  given  number  of  action 


EXCESS  OF  No*  OUTSIDE 


EXCESS  OF  K*  INSIDE 


Cl"  IN 

EQUILIBRIUM 


1  SODIUM 
PUMP 


'  Tlu*  neuron  membrane 
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potentials  per  second.  Thus  the  information  is  frequency  modulated  and 
can  be  transmitted  with  essentially  no  loss  over  large  distances.  It  is  quite 
remarkable,  considering  the  materials  available,  that  information  can  be 
transmitted  with  such  accuracy. 

The  transmission  of  information  from  one  neuron  to  another  is  perhaps 
even  more  remarkable.  An  axon  of  one  neuron  will  terminate,  in  general, 
near  the  dendrite  of  another  in  a  structure  known  as  a  synaptic  junction 
(Fig.  4).  Action  potentials  arriving  from  the  axon  initiate  release  of 
transmitter  substances  that  diffuse  across  the  synaptic  cleft.  Upon  arrival 
at  the  post-svnaptic  membrane,  these  transmitters  produce  changes  in 
membrane  conductivity,  initiating  a  flow  of  ions  that  alter  the  dentrite 
potential.  The  dendrite  potentials  propagate  to  the  cell  body  where  they 
are  integrated  and  determine  the  firing  rate  of  the 
post-synaptic  cell.  Thus  the  information  flow  con¬ 
tinues. 

It  is  now  commonly  thought  that  the  synaptic  junction  may  be  a  means 
to  store  information  (memory,  for  example)  as  well  as  to  transmit  it  from 
neuron  to  neuron.  Large  networks  of  neurons  connected  to  other  neurons 
via  modifiable  synaptic  junctions  are  what  we  have  used  to  try  to 
construct  entities  that  are  capable  of  holding  memory  and  performing 
mental  acts. 

Although  the  central  nervous  system  contains  something  of  the  order  of 
10-100  billion  neurons,  it  is  somewhat  depressing  to  learn  that  these  cells 
are  so  specialized  that  they  do  not  reproduce.  Thus  when  the  embryo  is 
given  its  store  of  neurons,  these  are  the  only  ones  we  will  ever  have.  But  cells 
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die.  We  lose  an  estimated  hundred  thousand  neurons  each  day  so  that, at 
the  end  ot  a  long  lifetime,  we  have  lost  a  very  substantial  number  of 
neurons.  Any  theory  that  purports  to  explain  the  functioning  of  the  central 
nervous  system  must  explain  how  it  continues  to  function  in  spite  of  the 
loss  of  individual  neurons. 

This  is  one  of  the  facts  that  makes  the  sometime  employed  analogies 
between  the  brain  and  current  computers  really  very  bad.  Modern  com¬ 
puters  perform  large  numbers  of  sequential  operations  very  rapidly  and 
very  accurately.  It  is  a  technological  miracle  that  so  many  operations  can 
be  performed  with  so  few  mistakes.  The  central  nervous  system  works 
slowly  with  cycle  times  that  cannot  be  shorter  than  a  few  milliseconds.  It 
is  not  very  accurate:  neurons  may  not  fire  if  they  are  tired  or  depressed  or 
just  too  lazy — in  addition,  they  die. 

It  may  be  that  current  computers  can  do  some  of  the  things  that  we  can 
do.  However,  most  of  the  things  a  computer  can  do.  we  do  not  do  very 
well.  We  are  very  poor  at  arithmetic  and  sequential  logical  operations; 
this  is  what  current  computers  do  very  well.  We  are  very  good  at 
recognizing  things  and  getting  a  sense  of  what  is  going  on.  For  this 
computers  are  very  bad.  So.  if  computers  can  mimic  human  beings  by 
doing  some  of  things  we  do.  it  is  very  likely  that  the  same  thing  is  being 
done  in  a  rather  different  way  with  rather  different  hardware. 

Although  this  is  a  very  complex  problem,  in  a  certain  sense  there  is 
much  that  is  known.  A  set  of  coupled  non-linear  differential  equations, 
including  time  delays,  can  be  written  down  that  in  effect  summarizes 
everything  that  is  known  about  the  transmission  of  electric'al  signals  along 
excitable  membranes  and  from  axon  to  dendrite  in  a  large  coupled  and 
reentrant  network  of  neurons.  Such  systems  can  be  stationary  or  can 
evolve  in  time  by  various  learning  algorithms.  Obviously  such  a  set  of 
equations  with  no  simplification  is  extremely  complex  and  beyond  the 
capacity  of  current  analysis  or  numerical  techniques.  The  essential  point  is 
to  make  the  appropriate  approximations  and  to  clearly  illuminate  the 
paths  connecting  assumptions  and  consequences.  Various  approaches 
such  as  those  of  Amari  (1974),  Wilson  and  Cowan  (1973), 
Edelman  (1981),  Edelman  and  Reeke  (1982)  and 
Grossberg  (1982). 

In  our  work  we  emphasize  the  transfer  of  information  between  neuron 
sets  to  neuron  sets;  we  propose  that  there  is  much  parallel  processing  in 
the  central  nervous  system  and  1  ;n  contrast  with  machine  memory, 
which  is  at  present  local  (an  c  v  tored  in  a  specific  place)  and 
addressable  hv  locality  (requiring  some  equivalent  of  indices  and  files), 
our  memory  is  distributed  and  addressable  bv  content  or  by  association. 
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In  addition  there  need  he  no  clear  separation  between  memory  and 

’logic’,  which  is  a  result  of  association  and  an  out-come 
of  the  nature  of  the  memory  itself. 

Such  distributed  memories  are  more  like  a  hologram  than  a  photograph. 

An  individual  synaptic  junction  holds  superimposed  information 
concerning  many  events.  In  order  to  obtain  a  single  event,  one  has  to 
gather  information  from  many  junctions.  In  a  system  like  this.  loss  of 
individual  neurons  decreases  the  signal  to  noise  ratio  but  does  not  lose 
individual  items  of  information.  Therefore,  if  it  is  overbuilt  in  the  first 
place,  one  can  retain  a  complete  memory  with  an  acceptable  signal  to 
noise  ratio  even  with  loss  of  neurons. 

These  ideas  as  described  in  more  detail  in  what  follows.  Although  these 
initial  attempts  are  clearly  oversimplifications,  our  hope  is  to  capture 
some  of  the  important  qualitative  features  of  a  very  complex 
phenomenon  in  a  piece  of  structure  that  is  clear  enough  so  that  we  can 
say  what  follows  from  what,  an  explicit  enough  so  that  we  can  make 
contact  with  experiment  as  soon  as  possible. 


I.  Theoretical  background 
I.  1  Distributed  memory 

That  most  intriguing  aspect  of  memory:  its  persistence  in  spite  of 
continual  loss  of  individual  neurons  over  the  lifetime  of  the  individual,  led 
us  early  to  the  concept  of  distributed  rather  than  local  memory  storage. 
Distributed  storage  possesses  in  a  very  natural  way  the  property  of 
relative  invulnerability,  to  the  loss  of  individual  storage  units.  We  have 
been  analyzing  a  class  of  neural  models  for  the  acquisition  and  storage  of 
distributed  momories  that  display,  on  a  primitive  level,  features  such  as 
recognition,  association  and  generalization,  and  which  suggest  some  of 
the  mental  behavior  associated  with  animal  memory  and  learning  (Cooper 
(1973):  Anderson  and  Cooper  (1978)).  The  mechanisms  we  employ  seem 
to  be  plausible  biologically  and  are  not  inconsistent  with  known  neuro¬ 
physiology.  In  addition  the  networks  that  results  seem  to  be  a  reasonable 
outcome  of  evolutionary  development  under  the  pressure  of  survival. 
Some  of  our  ideas  are  related  to  or  are  generalizations  of  earlier  concepts 
such  as  perceptrons  or  similar  models  (Block.  (1962);  Block.  Knight  and 
Rosenblatt  (1962);  Minsky  and  Paper!  (1969)).  In  addition  holographic  or 
non-local  memories  have  been  explored  previously  (l.onguet-Higgins 
(1968a);  l.onguet-Higgins  (1968b)). 
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Although  the  concept  of  distributed  mappings  and  memory  storage  is 
less  familiar  than  those  of  local  storage,  distributed  mappings  and  their 
properties  have  been  discussed  (Anderson  (1970).  (1972):  Cooper  (1973): 
Kohonen  (1972),  (1977))  and  probably  have  already  been  observed.  An 
example  is  superior  colliculus.  In  spite  of  the  fact  that  the  retinal  alferents 
that  project  to  the  colliculus  form  a  very  precise,  fine  grained  map,  cells 
that  are  just  a  few  millimeters  below  the  very  precise  cells,  respond  to 
stimuli  over  a  wide  area  of  visual  space.  Thus  we  have  the  apparently 
paradoxical  situation — that  seems  to  be  true  of  other  parts  of  the  brain  as 
well — that  great  precision  of  response  is  generated  by  systems  composed 
of  cells  that  progressively  show  less  and  less  selectivity  as  the  motor 
output  of  the  system  is  approached  (Mcllwain  (1976)). 

We  believe  that  much  of  the  learning  and  resulting  organization  of  the 
central  nervous  system  occurs  due  to  modification  of  the  efficacy  or 
strength  of  at  least  some  of  the  synaptic  junctions  between  neurons,  thus 
altering  the  relation  between  pre-synaptic  and  post-svnaptic  potentials.  It 
is  known  that  small  but  coherent  modifications  of  large  numbers  of 
synaptic  junctions  can  result  in  distributed  memories.  Whether  and  how 
such  synaptic  modification  occurs,  what  precise  forms  it  takes,  and  what 
the  physiological  and/or  anatomical  bases  of  this  modification  are.  rank 
among  the  most  interesting  questions  in  this  area. 

There  is  direct  experimental  evidence  that  at  least  some  modification  of 
synaptic  strength  occurs  in  invertebrates  (Kandel  (1976))  and  there  are 
various  indications  that  synaptic  modification  is  a  rather  general 
phenomenon  (see.  for  example.  (Levy  and  Steward  (1979)).  In  recent  years 
many  conjectures  have  been  made  concerning  the  kind  of  modification  that 
might  occur  at  synaptic.junctions.  Kandel  and  coworkers  have  shown  that 
modification  of  the  synapse  between  a  sensory  and  motor  neuron  of  the 
marine  Mollusk  Aplysia  is  the  basis  of  habituation  and  sensitization.  The 
synaptic  modification  they  have  observed  can  be  dependent  only  on 
pre-synaptic  information.  In  our  work,  we  have  had  to  assume  that  synaptic 
modification  is  a  function  of  more  general  variables:  local,  quasi-local,  and 
global.  The  presence  of  quasi-local  variables  leads,  to  forms  of  synaptic 
modification  (denoted  as  Hebbian)  that  depend  on  information  not  im¬ 
mediately  available  at  the  synaptic  site  (e.g..  cell  firing  rates). 

These  hypotheses  have  been  developed  in  some  detail  (Nass  and 
Cooper  (1975):  Cooper.  Liberman  and  Oja  (1979);  Bienenstock.  Cooper 
and  Munro  (1982))  and  applied  to  experimental  results  that  have  been 
obtained  in  visual  cortex  by  many  workers  over  the  last  generation  as  well 
as  to  higher  level  network  properties.  As  will  be  explained  more  fully,  we 
have  been  able  to  obtain  agreement  with  classical  visual  cortex  experi- 
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mental  results  anti  in  addition  have  suggested  new  experiments.  These 
theoretical  results  have  been  obtained  with  a  minimum  of  anatomical 
details  and  have  been  primarily  concerned  with  single  neurons.  It  is  very 
likely,  however,  that  interactions  between  cortical  neurons  play  an  im¬ 
portant  role  in  cortical  function  as  well,  perhaps  in  selectivity  of  in¬ 
dividual  cortical  cells  (Creutzfeldt  et  at.  (1474);  Sillito  (1475)).  In  addition 
individual  synapses  are  generally  either  inhibitory  or  excitatory  and  not 
both  as  some  theoretical  work  has  assumed  for  simplicity.  One  objective 
of  our  current  research  is  to  extend  our  results,  taking  into  account  a 
more  realistic  anatomy  so  that  a  more  detailed  comparison  between 
theory  and  experiment  can  be  made. 

The  theoretical  ideas  mentioned  above  have  also  led  to  several  sug¬ 
gestions  for  new  experiments  in  visual  cortex.  In  these  experiments  we 
attempt  to  verify  predictions  concerning  the  connection  between 
specificity  and  ocular  dominance  of  cortical  cells  under  various  rearing 
conditions.  In  addition,  we  investigate  connections  between  ocular 
dominance  and  the  variety  of  visual  input  allowed  to  the  open  eye.  We 
expect  the  results  to  give  us  further  information  about  the  detailed 
mechanism  of  synaptic  modification  among  cortical  cells  as  well  as  to 
enable  us  to  determine  various  system  parameters. 


For  a  distributed  memory  it  is  the  simultaneous  or  near  simultaneous 
activities  of  many  different  neurons  (the  result  of  external  or  internal 
stimuli)  that  is  of  interest.  Thus  a  large  spatially  distributed  pattern  of 
neuron  discharges,  each  of  which  might  not  be  verv  far  from  spontaneous 
activity,  could  contain  important,  if  hard  to  detect,  information  l  et  us 
consider  the  behavior  of  an  idealized  neural  network  (that  might  be 
regarded  as  a  model  component  of  a  nervous  system  t  to  illustrate  some  of 
the  important  features  of  distributed  mappings. 

Consider  V  neurons  1.2.../V.  each  of  which  has  some  spontaneous 
firing  rate  r,„.  (This  need  not  be  the  same  for  all  of  the  neurons  nor  need  it 
be  constant  in  time.)  We  can  then  define  an  N-tuple  whose  components 
are  the  difference  between  the  actual  tiring  rate  r.  of  the  / 1 h  neuron  and 
the  spontaneous  tiring  rate  r.„ 


f,  --  r  r.„ .  (Ill 

Bv  constructing  two  such  banks  of  neurons  connected  to  one  another 
(or  even  bv  the  use  of  a  single  bank  which  feeds  signals  back  to  itself),  we 
arrive  at  a  simplified  model  as  illustrated  in  Fie  5 

The  actual  svnaptic  connections  between  one  neuron  and  another  are 
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Fii*.  5.  An  nlcnl  distributed  mapping.  I  aeh  of  (he  \  input  neurons  m  l  is  eonneeted  to  each 
of  the  .V  output  neurons  in  (i  bv  a  single  ideal  pinction  (( )n!\  the  connections  to  i  are 
drawn  ) 

generally  complex  and  redundant;  we  have  idealized  the  network  In 
replacing  this  multiplicity  of  synapses  between  axons  and  dendrites  by  a 
single  ideal  junction  which  summarizes  logically  the  effect  of  all  of  the 
synaptic  contacts  between  the  incoming  axon  branches  from  neuron  j  in 
the  F  bank  and  the  dendrites  of  the  outgoing  neuron  i  in  the  G  bank  (Fig.  6.). 
Each  of  the  N  incoming  neurons,  in  F.  is  connected  to  each  of  the  .V 
outgoing  neurons,  in  G,  by  a  single  ideal  junction. 

Although  the  firing  rate  of  a  neuron  depends  in  a  complex  and 
nonlinear  fashion  on  the  presvnaptie  potentials,  there  is  usually  a  reason¬ 
ably  well  defined  linear  region.  Some  very  interesting  network  properties 
are  already  evident  m  this  linear  region.  We  therefore  focus  our  attention, 
for  the  moment,  on  the  region  above  threshold  and  below  saturation  for 
which  the  tiring  rate  of  neuron  i  in  G.  g„  is  mapped  from  the  firing  rates 
of  all  of  the  neurons.  /,.  in  /-'  by: 


G  BANK  N  F  BANK 
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In  doing  this  we  are  regarding  as  important  average  bring  rates,  and  time 
averages  of  the  instantaneous  signals  in  a  neuron  (or  perhaps  a  small 
population  of  neurons).  We  are  further  using  the  known  integrative 
properties  of  neurons. 

We  may  then  regard  |A„|  (the  synaptic  strengths  of  the  N:  ideal 
lunctions)  as  a  matrix  or  a  mapping  which  takes  us  from  a  vector  in  the  F 
space  to  one  in  the  Ci  space.  This  maps  the  neural  activities  /' =  (/ 
f: . .  .  fs )  in  the  F  space  into  the  neural  activities  g  =  (g,  .  .  .  gv )  in  the  G 
space  and  can  he  written  in  the  compact  form 

g  =  Af .  (1.3) 

We  propose  that  it  is  in  modifiable  mappings  of  the  type  A  that  human 
memory  is  stored.  Presently  machine  memory  is  local  (an  event  stored  in 
a  specific  place)  and  addressable  by  locality  (requiring  some  equivalent  of 
indices  and  tiles)  In  contrast,  human  memory  is  likely  to  be  distributed 
and  addressable  by  content  or  by  association.  In  addition  for  such  a 
memory  there  need  be  no  clear  separation  between  memory  and  ‘logic-. 

It  is  convenient  to  write  the  mapping.  A.  in  the  basis  of  vectors  the 
system  has  experienced: 


•3  =  2  y  f  •  (1-4) 

Here  g“  and  f‘  are  output  and  input  patterns  of  neural  activity  while  the 
are  coefficients  reflecting  the  degree  of  connection  between  various 
inputs  and  outputs.  The  symbol.  \  represents  the  ‘outer-  product  between 
the  input  and  output  vectors.  Although  (1.4)  is  a  well  known  mathemati¬ 
cal  form,  its  meaning  as  a  mapping  among  neurons  deserves  some 
discussion.  The  i/th  element  of  A  gives  the  strength  of  the  ideal  junction 
between  the  incoming  neuron  /  in  the  F  bank  and  the  outgoing  neuron  / 
in  the  G  bank  (Fig.  (>.). 

Thus,  if  onlv  f,  is  non-zero.  g„  the  tiring  rate  of  the  ith  output  neuron  is 

g,  AJ. .  (1.?) 


Since 


<  l  -ft) 
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the  i/th  junction  strength  is  contposcil  ot  a  sum  of  the  entire  experience  of 
the  system  as  reflected  in  tiring  rates  of  the  neurons  connected  to  this 
junction.  Each  experience  or  association  however,  is  stored  over  the 
entire  arrav  of  .V  x  \  junctions.  This  is  the  essential  meaning  of  a 
distributed  memory:  Each  event  is  stored  over  a  large  portion  of  the 
system,  while  at  any  particular  local  point  many  events  are  superimposed. 

We  show  below  that  the  non-local  mapping  A  can  serve  in  a  highly 
precise  fashion  as  a  memorv  that  is  content  addressable  and  in  which 
'logic'  is  a  result  of  association  and  an  outcome  of  the  nature  of  the 
memory  itself. 

1.2.  Long  and  short-term  memory 

The  /V-  junctions.  A„.  contain  the  content  of  the  distributed  memory.  It 
could  he  that  a  particular  junction  strength.  A„.  is  composed  of  several 
dilferent  components  with  different  lifetimes,  e.g.. 

A*  A!,"-'- A};'*  A},**.  (1.7) 

where  the  individual  Al}1  might  be  thought  of  as  corresponding  to 
different  physiological  or  anatomical  effects  (e.g..  changes  in  numbers  of 
presynaptic  vesicles,  changes  in  numbers  of  postsynaptic  receptors, 
changes  in  Ca"  levels  and/or  availability,  anatomical  changes  such  as 
might  occur  in  growth  or  shrinkage  of  spines).  We  then  have  the  pos¬ 
sibility  that  the  actual  memory  content  (even  in  the  absence  of  additional 
learning)  will  vary  with  time.  For  a  two-component  system  we  might  have 

a;;'  -  Afr^'in  +  (i.s) 

9 

where  A1J*  represents  the  memory  at  some  time.  t.  while  /Alj0"*1  and  A),  " 
have  long  and  short  lifetimes.  Thus  in  time  .4!*°'".  will  decay,  leaving 
A),'1  =  A!!""*1.  Whether  what  is  in  the  short-term  memory  component  is 
transferred  to  the  long-term  component  might  be  determined  by  some 
global  signal — depending  on  the  interest  of  the  information  contained  in 
the  short-term  component.  The  existence  of  such  global  signals  as  well  as 
possible  anatomical  or  physiological  correlates  of  short  or  long-term 
memory  are  the  subject  of  some  of  our  current  research. 

From  this  point  of  view  the  site  of  long  and  short-term  memory  can  be 
essentially  identical.  At  any  given  time  there  is  a  single  memory.  The 
distinction  between  long  and  short-term  memorv  is  contained  in  the 
lifetime  of  the  dilferent  components  of  A„. 
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Recognition  iitul  recollection 

The  fundamental  problem  posed  by  a  distributed  memory  is  the  ad¬ 
dress  and  aceuracv  of  reeall  of  the  stored  patterns.  Consider  first  the 
'diagonal'  portion  of  A. 

(A  )d„p,nj,  -  -  V  c,,.g-  x  f  ■  ( I  •'» 

An  arbitrary  event,  e.  in  the  external  world  mapped  by  the  sensory 
apparatus  into  the  pattern  of  neural  activity.  /.  will  generate  the  response 
in  G 


g  =  Af. 

(The  pattern,  f.  might  also  be  the  result  of  some  other  internal  pattern  of 
neural  activity.)  If  we  equate  recognition  with  the  strength  of  this  res¬ 
ponse.  say  the  inner  product  (g,  g).  then  the  mapping  A  will  distinguish 

between  those  events  it  contains,  the  f“.  v-  1. 2 _ K  and  other  events 

separated  from  these. 

The  work  ’separated'  in  the  above  context  requires  definition.  Suppose 
the  vectors  f  are  thought  to  be  independent  of  each  other,  and  to  satisfy 
the  requirements  that,  on  the  average 

V  I)  =  (l.  $(/;  )-  =  I  .  O  KI) 

I  I  t  1 

Any  two  such  vectors  have  components  which  are  random  with  respect  to 
one  another  so  that  a  new  vector,  f,  presented  in  the  F  bank  as  above 
gives  a  noise  like  responst  in  the  O’  bank  since  on  the  average  (/’.  /)  is 
small.  The  presentation  of  a  vector  seen  previously.  fA.  however,  gives  the 
response  in  the  G  bank 


./If'  ~  Oug*  +  noise  .  (I  ll) 

It  can  he  shown  that  if  the  number  of  imprinted  events.  K.  is  small 
compared  to  the  dimensionality  ,V.  the  signal-to-noise  ratios  are  reason¬ 
able. 

If  we  define  separated  events  as  those  which  map  into  orthogonal 
vectors,  then  clearly  a  recognition  matrix  composed  of  K  orthogonal 
vectors  /  '.  f: . fK  : 
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■*  =  S  «■««-  x  f  (1.12) 

i  I 

will  distinguish  between  those  vectors  contained  and  all  vectors  separated 
from  (perpendicular  to)  these.  Further,  the  response  of  the  system  to  a 
vector  previously  recorded  is  unique  and  completely  accurate 

JifK  =  cA^  .  (1.13) 

In  this  special  situation,  the  distributed  memory  is  as  precise  as  a  localized 
memory. 

In  addition,  this  type  of  memory  has  the  interesting  property  of 
recalling  an  entire  associated  vector  g*  even  if  only  part  of  /*  is  presen¬ 
ted.  Let 


/*=/?  +  n  .  (i.i4) 

If  only  part  of  fx.  say  f\  is  presented,  we  obtain 

(/)./{ )gA  +  noise  .  (1.15) 

The  result  is  the  entire  response  to  the  full  fk  with  a  reduced  coefficient 
plus  noise. 

1.4.  Association 

If  we  now  take  the  point  of  view  that  presentation  of  the  event  e‘ 
which  generates  the  vector  /*’  is  recollected  if 

J!f‘  =  (•„,£"  +  noise  .  (Lift) 

Then  the  otf-diagonal  terms 

3  £  v  r  .  (1.17) 


may  be  interpreted  as  containing  associations  between  events  initially 
separated  from  one  another. 

For  such  terms  the  presentation  of  event  will  generate  not  only  g‘ 
(which  is  equivalent  to  the  recollection  of  f  t  but  also,  and  perhaps  more 
weaklv.  g"  which  should  result  with  the  presentation  of  <■".  Thus,  for 
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Fill.  7.  An  ideal  association. 


example,  if  g“  will  initiate  some  response,  originally  a  response  to  e*.  The 
presentation  of  e"  when  0  will  also  initiate  this  response. 

We  can  thus  divide  the  association  matrix  A  into  two  parts: 

A  =  V  x  (•  =  //  +  .-V .  (1.1  X) 

***• 

where 

.'/?  =  ( A)d,„onal  =  2  c->«"  x  f  •  (1.19) 

=  (A  loH  dugoral  s  S  X  f  .  (  1  .20) 

The  cM1.  are  then  the  direct  recollection  and  association  coefficients.  Some 
of  the  consequences  of  the  properties  discussed  in  the  last  two  sections 
are  the  subject  of  some  of  our  cut  rent  and  continuing  research  and  are 
further  discussed  in  Subsection  1.5,  1.6,  and  1.7. 

/..\  Network  modification  learning 

The  properties  described  above  require  coherence  among  many  synap¬ 
tic  junctions.  We  therefore  ask:  According  to  what  rule  and  by  what 
means  do  neurons  modify  themselves  to  form  a  matrix  of  junctions  with 
the  properties  of  memory?  A  major  effort  of  our  research  is  to  elucidate 
this  question. 

Such  a  modification  rule  can  be  cast  in  the  form  of  stochastic  or 
deterministic  differential  equations  dependent  on  variables  that  we  clas¬ 
sify  as  local,  quasi-local  and  global. 

A„  <P(g„  /,.  A„.  /....).  (1.21) 

Different  stieh  rules  lead  to  various  types  of  memories.  In  the  following 
sections  several  rules  rules  for  plasticity  will  be  examined.  The  recollee- 
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tion-associalion  memory  (1.18)  described  above  is  obtained  from  the 
following  simple  bilinear  modification  rule: 

f>A„  fiJ,.  (1.22) 

This  <5 A„  is  proportional  to  the  product  of  the  differences  between  the 
actual  and  the  spontaneous  firing  rates  in  the  pre-  and  post-svnaptic 
neurons  i  and  /.  (This  is  one  realization  of  Hebb's  form  of  synaptic 
modification  (Hebb  (1949)).)  The  addition  of  such  changes  to  A  for  all 
associations  *  /*'  results  finally  in  a  mapping  with  the  properties 
discussed  in  the  previous  sections. 

Synaptic  modification  dependent  on  inputs  alone,  of  the  type  already 
directly  observed  in  Aplysia  (Kundel  and  Taue  (1965);  Castellucci  and 
Kandel  (1974))  is  sufficient  to  construct  a  simple  memory — one  that 
distinguishes  what  has  been  seen  from  what  has  not.  but  does  not  easily 
separate  one  input  from  another.  To  construct  a  mapping  of  the  form 
above,  however,  requires  synaptic  modification  dependent  on  information 
that  exists  at  different  places  on  the  neuron  membrane,  what  we  call  two 
(or  higher)  point  modification. 

In  order  that  this  take  place,  information  must  be  communicated  from, 
for  example,  the  axon  hillock  to  the  synaptic  junction  to  be  modified.  This 
implies  the  existence  of  a  means  of  internal  communication  of  in¬ 
formation  within  a  neuron — in  the  above  example  in  a  direction  opposite 
to  the  flow  of  electrical  signals  (Cooper  (1975)).  The  junction  ij.  for 
example,  must  have  information  of  the  firing  rate  f,  (which  is  locally 
available)  as  well  as  the  tiring  rate  g,  which  is  somewhat  removed  (Fig.  <X). 

One  possibility  could  J>e  that  the  integrated  electrical  signals  from  the 
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dendrites  produce  a  chemical  or  electrical  response  in  the  cell  body  which 
controls  the  spiking  rale  of  the  axon  and  at  the  same  time  communicates 
(by  backward  spiking,  for  example)  to  the  dendrite  ends  the  information 
of  the  integrated  slow  potential.  Another  possibility  is  that  dendritic 
shafts  act  as  somewhat  independent  units  so  that  the  local  integrated 
dendritic  potentials  interacting  with  the  potentials  incoming  to  the  in¬ 
dividual  spines  combine  to  produce  changes  in  spine  shape  and  resistivity. 
Such  changes  might  be  observable  in  anatomical  studies  and  are  the 
subject  of  one  of  our  current  research  projects. 

One  might  guess  that  once  the  physiological  mechanism  for  such 
communication  was  available,  different  types  of  two  (or  higher!  point 
modification  evolved  in  various  ways.  It  is  tempting  to  conjecture  that  a 
liberating  evolutionary  step  was  just  the  development  of  this  means  of 
internal  communication  that,  coupled  with  the  ability  of  synapses  to 
modify,  created  the  possibility  for  a  new  organization  principle. 

There  is  a  variety  of  means  by  which  the  coefficient  A„  might  be 
modified,  given  that  the  necessary  information  is  available  at  the  i/th 
junction.  Among  these  might  be  growth  of  additional  or  change  in 
electrical  properties  of  dendrite  spines,  addition  of  new  synaptic  junc¬ 
tions,  activation  of  synaptic  junctions  previously  inactive,  changes  in 
membrane  resistivity  and/or  changes  in  the  amount  of  transmitter  or 
receptor  in  a  synapse.  Although  some  structural  changes  have  been 
observed,  there  is  little  evidence  yet  to  choose  among  the  possibilities 
mentioned  above.  This  is  the  subject  of  much  current  research. 

Passive  modification 
To  make  the  modification 


ha  g“  •  r  (1.23) 

bv  any  of  the  mechanisms  suggested  above,  the  system  must  have  the 
signal  distribution  f"  in  its  F  bank  and  g“  in  its  C7  bank.  It  is  easy  to 
obtain  {'  since  this  is  mapped  from  either  an  external  event  or  is  some 
internal  pattern.  Hut  to  get  g"  in  the  Ci  bank  is  more  difficult  since  this  in 
effect  is  what  the  system  is  trying  to  learn. 

In  what  we  denote  as  active  learning,  the  system  is  presented  with  some 
f\  searches  for  a  response,  anil  is  given  some  indication  of  when  it  is 
coming  closer.  When  bv  some  procedure  or  another  it  linds  the  'right' 
response,  sav  ,g".  if  is  rewarded'  and  responds  to  the  reward  bv  printing 
into  A  the  information: 
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f>A„  =  vl>rr;.  (i.24) 

(The  information  is  available  at  the  time  of  the  reward  sinee  at  that  time 
the  system  is  mapping  f\  responding  g".  and  thus  has  just  the  desired 
spiking  frequencies  in  the  F  and  G  banks  of  neurons.)  Active  learning 
probably  describes  a  type  of  learning  in  which  a  system  response  to  an 
input  is  matched  against  an  expected  or  desired  response  and  judged 
correct  or  incorrect. 

However,  there  is  a  type  of  learning  that  does  not  seem  from  visible 
external  indications  to  require  this  type  of  a  search  procedure.  It  is  the 
type  of  learning  in  which,  as  far  as  can  be  seen,  an  animal  is  placed  in  an 
environment  and  seems  to  learn  to  recognize  and  to  recollect  in  a  far 
more  passive  manner. 

Tvi  arrive  at  an  algorithm  which  produces  what  we  call  passive  learning, 
we  utilize  a  distinction  between  forming  an  internal  representation  of 
events  in  the  external  world  as  opposed  to  producing  a  response  to  these 
events  that  is  matched  against  what  is  expected  or  desired  in  the  external 
world. 

The  simple  but  important  idea  is  that  the  internal  electrical  activity  that 
in  one  mind  signals  the  presence  of  an  external  event  is  not  necessarily  (or 
likelx  to  he)  the  same  electrical  activity  that  signals  the  presence  of  that 
same  event  for  another  mind.  There  is  nothing  that  requires  that  the  same 
external  event  be  mapped  into  the  same  neural  patterns  by  dilferent 
animals.  The  event  e‘  which  for  one  animal  is  mapped  into  the  signal 
distributions  f"  and  g",  in  another  animal  is  mapped  into  /""  and  g‘".  What  is 
required  for  eventual  agreement  between  animals  in  their  description  of 
the  external  world  is  not  that  electrical  signals  mapped  be  identical  but 
rather  that  the  relation  of  the  signals  to  each  other  and  to  events  in  the 
external  world  be  the  same  (Fig  '■).). 

If  we  now  allow  the  output  of  a  cell  to  be  determined  he  the  input  to 
that  cell  and  the  already  existing  synaptic  junction  strengths,  as  well  as  by 
possible  noise-like  fluctuations  (making  no  prior  requirement  on  what  the 
output  should  be),  we  arrive  at  a  mathematical  formulation  of  what  we 
call  passive  modification  (Cooper  ( 1**73)): 

M„  g,f,  V  AJJ ;.  (125) 

i  I 

It  has  been  shown  in  the  above  reference  that  with  a  simple  form  of 
passive  modification  a  system  generates  its  own  response  to  incoming 
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l-'iis  Representations  in  two  different  systems  of  the  same  external  fabric  of  exents  The 
two  representations  are  not  identical,  hut  they  each  stand  in  a  one-to-one  relation  to  the 
external  fabric  and  to  each  other 


patterns  in  such  a  way  as  to  construct  distributed  mappings  that  can 
function  us  memories  capable  of  recognition  anil  association.  To  a  limited 
extent  these  mappings  can  be  regarded  as  internal  representations  of  what 
has  arrived  from  the  outside  world.  It  has  further  been  shown  (Nass  and 
Cooper  (ll)75))  that  a  form  of  passive  modification  can  result  in  the 
formation  of  feature  detectors  or  threshold  response  units  which  learn  to 
respond  to  repeated  patterns  even  in  the  absence  of  any  initial  bias.  Such 
units  can  serve  to  perform  sonic  nonlinear  separations. 

More  detailed  discussion  of  the  consequences  of  these  modification 
procedures  and  the  properties  of  some  of  the  mappings  that  result  is 
contained  in  the  references  cited  above.  The  application  of  these  ideas  to 
visual  cortical  cells  is  discussed  in  Section  2. 
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1.7.  heat ure  abstraction 

Some  networks  of  neurons  must  have  the  ability  to  extract  meaningful 
information  from  a  broad  range  of  input  environments.  In  the  ease  of 
sensory  input  to  cortex,  for  example,  the  system's  range  is  internally 
constrained  by  the  response  characteristics  of  the  sensory  neurons  and 
externally  by  the  nature  of  the  stimulus  environment.  This  stimulus 
environment  depends  a  great  deal  upon  the  nature  of  the  creature's 
surroundings.  Precise  statements  regarding  aspects  of  environmental 
structure  relevant  to  mathematical  models  are  given  in  the  next  section. 

Consider  the  recognition-association  memory  (I . IN)  described  above.  In 
actual  experience,  the  events  to  which  the  system  is  exposed  are  not  in 
general  highlv  separated  nor  are  they  independent  in  a  statistical  sense. 
There  is  no  reason,  therefore,  to  expect  that  all  vectors.  f‘.  printed  into  A 
according  to  the  modification  rule  (1.25)  would  be  orthogonal  or  even 
very  far  from  one  another.  Rather  it  seems  likely  that  often  large 
numbers  of  these  vectors  would  lie  close  to  one  another.  Linder  these 
circumstances,  a  distributed  memory  might  be  'confused-  in  the  sense  that 
it  will  respond  to  new  events  as  if  they  were  old.  if  the  new  event  is  close 
to  an  old  one.  It  will  ‘recognize-  and  ‘associate-  events  never,  in  fact,  seen 
or  associated  before. 

The  memory  will  tend  to  categorize  stimuli  on  the  basis  of  the  past 
history  of  the  system.  For  example,  suppose  a  number  of  vectors  in  the 
memory  are  of  the  fornt 

r  =  f  -  »f  (1.2b) 

where  n‘  varies  randomly:  /”  will  eventually  be  recognized  more  strongly 
than  any  particular  f‘  actuallv  presented.  This,  of  course,  is  reminiscent  of 
psychological  properties  called  'generalization-  or  'abstraction-.  From 
such  a  point  of  view,  generalization  grows  from  the  loss  of  detail  of 
individual  instances,  a  trade-olT  that  seems  characteristic  of  distributed 
xvxtems. 

NVe  have  here  an  explicit  realization  of  feature  abstraction.  This 
generalizing  tfuality  might  be  described  as  the  result  of  a  built-in  directive 
for  inductive  logic.  The  associative  memory  by  its  nature  takes  the  step 

/”  *  n'.  I"  ‘  ir . /”  *- r T.  ■  r  f  (1.27) 

which  one  perhaps  attempts  to  describe  in  language  as  passing  from 
particulars:  cat',  cat’,  cat  to  general:  cat 


22. 


L.  N  Cooper/Neuron  Learning  to  Network  Organization 


Ho*  List  this  step  is  taken  depends  on  the  parameters  of  the  system. 
Bv  altering  these  parameters,  it  is  possible  to  construct  mappings  which 
vary  from  those  which  retain  all  particulars  to  which  they  are  exposed,  to 
those  which  lose  the  particulars  and  retain  only  common  elements — the 
central  vector  of  any  class. 

In  addition  to  ‘errors'  of  recognition,  the  associative  memorv  also 
makes  ‘errors'  of  association.  If.  for  example,  all  (or  many)  of  the  vectors 
of  the  class  {f“ }.  defined  as  a  class  of  vectors  not  very  separated  from  one 
another,  associate  some  particular  g*’  so  that  the  mapping  contains  terms 
of  the  form 


v  <(jig«  < /■.  /■  £{/").  (i.2S) 

with  <‘n.  *  U  over  much  of  r  -  1.2 . K.  then  the  new  event  e*-1  which 

maps  into  fK"'  also  in  the  class  {/"(  will  not  only  be  recognized,  the  inner 
product  .■#/*■ ')  being  large,  but  will  also  associate  g*.  ..V /*''  = 

eg"  as  strongly  as  any  of  the  vectors  {'  . .  .  fK  explicitly  contained  in 

(1.28). 

If  errors  of  recognition  lead  to  the  process  described  in  language  as 
going  from  particulars  to  the  general,  errors  of  association  might  be 
described  as  going  from  particulars  to  a  universal:  cat1  meows,  cat- 
meows  .  ^  all  cats  meow. 

Whatever  etliciacv  this  inductive  process  has  will  depend  on  the  order 
of  the  world  in  which  the  animal  system  finds  itself.  If  the  world  is 
properly  ordered,  an  animal  system  that  ‘jumps  to  conclusions'  in  the 
sense  above  mav  be  better  able  to  adapt  and  react  to  the  hazards  of  its 

9 

environment  and  thus  survive. 

By  a  sequence  of  mappings  of  the  form  above  (or  by  feeding  the  output 
of  A  back  to  itself)  one  obtains  a  fabric  of  events  and  connections  that  is 
rich  as  well  as  suggestive.  One  easily  sees  the  possibility  of  a  flow  of 
electrical  activity  influenced  both  by  internal  distributed  mappings  and 
the  external  input.  This  flow  is  governed  not  only  bv  direct  association 
coefficients  (which  can  he  explicitly  learned)  but  also  by  indirect 
associations  due  to  the  overlapping  of  the  mapped  events  One  can 
imagine  situations  arising  m  which  direct  access  to  an  event,  or  a  class  of 
events,  has  been  lost  while  the  existence  of  this  event  or  class  of  events  in 
\  influences  the  flow  of  electrical  activity 

One  problem  in  making  the  identifications  suggested  above  is  that  such 
systems  tend  to  form  excessively  Inree  all-encompassing  classes.  But 
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means  have  been  devised  to  limit  the  extent  of  class  formation  In  fact 
such  mappings  can  be  made  to  separate  classes  as  well  as  to  unite  them 
(Kohonen  (l‘)77).  Cooper.  Liberman  and  Oja  (l‘)7h).  (CLO); 
Bienenstock.  Cooper  and  Munro  (B)S2).  (BCMl). 

Another  problem  is  a  direct  consequence  of  the  assumption  of  the 
Imearitv  of  the  system.  Any  state  is  generally  a  superposition  of  various 
vectors.  Thus  one  has  to  find  a  means  by  which  events — or  the  entities 
into  which  they  are  mapped — are  distinguished  from  one  another. 

There  are  various  possibilities;  neurons  are  so  non-linear  that  it  is  not  at 
all  difficult  to  imagine  non-linear  or  threshold  devices  that  would  separate 
one  vector  from  another.  Such  separation  processes  compliment  general¬ 
ization  processes  in  that  they  bring  out  the  differences  in  an  input 
environment  while  generalizing  cells  tune  to  the  component  most  common 
to  the  constituent  stimuli.  But  the  occurrence  of  a  vector  in  a  distributed 
memory  in  a  set  of  signals  over  a  large  number  of  neurons  each  of  which 
is  far  from  threshold.  A  basic  problem,  therefore,  is  how  to  associate  the 
threshold  of  a  single  cell  or  a  group  of  cells  with  such  a  distributed  signal. 
One  way  this  might  come  about  has  been  shown  In  Nass  and  Cooper 
(l‘)75).  Another  possibility  is  the  stochastic  process  recently  discussed  by 
Hoptield  (l')S2). 

In  addition  to  the  appearance  of  'pontifical'  cells  or  groups  of  cells, 
there  will  be  a  certain  separation  of  mapped  signals  due  to  actual 
localization  of  the  areas  in  which  these  signals  occur.  Lor  exmaple.  optical 
and  auditory  signals  are  subjected  to  much  processing  before  they  actu¬ 
ally  meet  in  cortex.  It  is  possible  to  imagine  that  identification  of  optical 
or  auditory  signals  (as  optical  or  auditory)  occurs  tirst  from  where  they 
appear  and  their  immediate  cluster  associations.  Connections  between 
an  optical  and  an  auditory  event  might  occur  as  suggested  in  Fig.  1(1. 
Although  the  svstems  described  above  are  relatively  primitive,  thev 
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suggest  various  psvchological  properties  and  .ire  used  m  out  research  to 
construct  models  of  some  aspects  of  behavior  and  language  learning. 

2.  Application  to  visual  cortex:  Comparison  of  theory  vvith  experiment 

2.1.  Summary  of  related  visual  vortex  experimental  data 

The  discussion  above  leads  to  a  central  issue:  what  is  the  principle  ot 
local  organization  that,  acting  in  a  large  network,  can  produce  the 
observed  complex  behavior  of  higher  mental  processes.  There  is  no  need 
to  assume  that  such  a  mechanism — believed  to  involve  svnaptic 
modification — operates  in  exactly  the  same  manner  in  all  portions  of  the 
nervous  system  or  in  all  animals.  However,  one  would  hope  that  certain 
fundamental  similarities  exist  so  that  a  detailed  analysis  of  the  properties 
of  this  mechanism  in  one  preparation  would  lead  to  some  conclusions  that 
are  generally  applicable.  We  are  interested  m  visual  cortex  because  the 
vast  amount  of  experimental  work  done  in  this  area  of  the  brain — 
particularly  area  17  of  cat  and  monkev — strongly  indicate  that  one  is 
observing  a  process  of  synaptic  modification  dependent  of  the  information 
locally  and  globally  available  to  the  cortical  cells. 

Experimental  work  of  the  last  generation,  beginning  with  the  pathbreak¬ 
ing  work  of  Hubei  and  Wiesel  ( l I‘H>2).  has  shown  that  there  exists  cells 
in  visual  eortex  (areas  17.  IS.  and  19)of  the  adult  cat  that  respond  in  a  precise 
and  highly  tuned  fashion  to  external  patterns,  in  particular  bars  or  edges  of 
given  orientation  and  moving  in  a  given  direction.  Much  further  work 
(Blakemore  and  Cooper  (*197(1).  Blakemore  and  Mitchell  ( 1973);  Hirsch  and 
Spinelli  (1971);  Pettigrew  and  Freeman  (1973))  has  been  taken  to  indicate 
that  the  number  and  response  characteristics  of  such  cortical  cells  can  be 
modified.  It  has  been  observed  in  particular  (Imbert  and  Buisseret  (1975); 
Blakemore  and  Van  Sluvters  (1975);  Buisseret  and  Imbert  (1976);  and 
Fregnac  and  Imbert  (1977.  1978)),  that  the  relative  number  of  cortical  cells 
that  are  highly  specific  in  their  response  to  visual  patterns  varies  in  a  verv 
striking  wav  with  the  visual  experience  of  the  animal  during  the  critical 
period. 

Most  kittens  first  open  their  eves  at  the  end  of  the  lirst  week  after  birth 
It  is  not  easv  to  assess  whether  or  not  orientation  selective  cells  exist  at 
that  time  in  striate  cortex:  lew  cells  are  visuallv  responsive,  and  the 
response  s  mam  characteristics  are  generallv  'sluggishness'  and  fatigabil- 
itv  However,  it  is  quite  generallv  agreed  that  as  soon  as  cortical  cells  are 
reliable  visuallv  stimulated  (e  g  .  at  2  weeks),  some  are  orientation  selcc- 
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live,  whatever  I  lie  previous  mmi.iI  experience  of  the  animal  (cf.  Hubei  anil 
Wicscl  ( I'Kv'l.  Blakemore  and  Van  Sluxters  (Rfs);  Buisseret  and  Imbert 
(107(>);  I regnac  and  Imbert  (IO'K)) 

Orientation  xeleetivitv  develops  and  extends  to  all  visual  cells  in  area  17 
il  the  animal  is  reared,  and  behaves  freely,  in  a  normal  visual  environ¬ 
ment  (NR):  complete  'speeilieation'  and  normal  binoculantv  (about  SO"., 
of  responsive  cells)  are  reached  at  about  0  weeks  of  ace  (Iregnac  and 
Imbert  (I0T5)).  However,  if  the  animal  is  reared  in  total  darkness  from 
birth  to  the  .me  of  o  weeks  (DR),  none  or  few  orientation  selective  cells 
are  then  recorded  (from  0  to  IS"...  depending  on  the  authors  and  the 
classification  criteria):  howe'er,  the  distribution  of  ocular  dominance 
seems  unaffected  (Blakemore  and  Mitchell  (107.*);  Imbert  and  Buisseret 
(1075):  Hlakemore  and  Van  Sluvters  (1075);  Buisseret  and  Imbert  (I07b|; 
I. eventhal  and  llirsch  (IdS(t).  Iregnac  and  Imbert  (I07S)).  In  animals 
whose  evehils  have  been  sutured  at  birth,  and  which  are  thus  hinoeularlv 
ileprived  of  pattern  vision  (BD).  a  somewhat  higher  proportion  (from  12 
to  50",.)  of  the  visuallv  excitable  cells  are  still  orientation  selective  at  <\ 
weeks  (and  even  bevonil  24  months  of  aye)  and  the  proportion  of 
binocular  cells  is  less  than  normal  (Wicsel  and  Hubei  (|0i>5):  Blakemore 
and  Van  Sluvters  ( 1 1>75 ) .  Ki.it/  and  Spear  (I0"'<i).  I  eventhal  and  llirsch 
(107-7).  Watkins,  et  al..  (I07S)) 

Imbert  and  Buisseret  have  classified  corneal  cells  that  respond  to  visual 
stimuli  into  three  groups — aspectlie.  immature,  and  spec, tic.  They.  Ireg- 
nac  and  Imbert  have  measured  the  relative  proportions  of  these  groups 
dependim:  on  the  visual  experience  of  the  animal.  The  distribution  of  the 
diflerent  cell  tvpes  in  three  age  groups  is  shown  in  l  ie.  1 1 

I  xammation  of  these  results,  which  were  obtained  from  the  mi*1*,  ol 
105(1  cells,  shows  that  cells  having  some  of  the  highly  specific  response 
properties  of  adult  visual  cortical  neurons,  especially  concerning  orien¬ 
tation  select iv it v  are  present  in  the  earliest  stages  of  post-natal  develop¬ 
ment  independent  of  visual  experience  (Iregnac  and  Imbert  (1077.  10781). 
However,  visual  experience  between  17  and  7o  davs  is  critical  m  deter- 
mimni!  the  evolution  of  these  cells.  Animals  reared  normallv  showed  a 
marked  increase  m  the  number  of  specific  cells  as  compared  with 
aspecilie  (  Hie  period  between  17  .mil  2S  davs  is  usuallv  sutlicient  to  reach 
flic  normal  adult  level  ol  specilicilv  )  The  icverse  is  true  for  animals 
re. lied  m  the  dark  A  statistical  analvsis  of  this  evolution,  performed  hv 
I  regnac  (|0"’S)  shows  dcarlv  the  striking  dependence  of  the  ratio  of 
sharpie  tuned  to  bioadlx  tuned  cells  depending  on  the  experience  of  the 
anim.il 
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In  addition.  as  has  been  shown  bv  lmbcrt  and  Buisseret  (1475).  Buis- 
scret  anil  Imbert  (|47oi  and  Buisscrct  cl  al.  (I47S)  as  little  as  six  hours  of 
normal  usual  experience  at  about  42  days  of  ace  can  alter  .in  a  striking 
fashion  the  ratio  of  specific  or  immature  to  aspecitie  cells  (Fig.  1.2.).  That 
such  a  short  usual  experience  can  change  the  tuning  ratios  so  markedly  is 
clear  evidence  of  the  great  plasticitv  of  these  cortical  cells  at  the  height  of 
the  critical  period. 

Of  all  usual  deprivation  paradigms,  putting  one  eve  in  a  competitive 
advantage  over  the  other  has  probably  the  most  striking  consequences.  If 
monocular  ltd-suture  (Ml))  is  performed  during  a  critical'  period  (ranging 
from  about  4  weeks  to  about  12  weeks i.  there  is  a  rapid  loss  of  bino- 
eularitv  to  the  profit  of  the  open  eve  (Wiesel  and  Hubei  (146.4.  1465)).  At 
this  stage,  opening  the  closed  eve  and  closing  the  experienced  one  max 
result  m  a  complete  reversal  of  ocular  dominance  (Blakemore  and  Van 
Sluv lets  (1474)).  A  disruption  of  binocularitv  that  does  not  favor  one  of 
the  eves  mav  be  obtained,  for  example.  In  provoking  an  artificial 
strabismus  (Hubei  and  Wiesel  (|4o5)l  or  In  an  alternating  monocular 
occlusion,  which  gives  both  eves  an  equal  amount  of  visual  stimulation 
(Blakemore  (1476))  In  what  follows,  we  call  this  uncorrelated  rearing 

(HR). 

These  results  seem  to  us  to  provide  direct  evidence  for  the  modiliabihlv 
of  the  response  of  single  cells  m  the  cortex  ol  a  higher  mammal  according 
to  ns  visual  experience  Depending  on  whcthci  or  not  patterned  visual 
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Fig.  12.  Distribution  in  percentage  of  the  three  types  of  visual  cortical  units  (area  17) 
recorded  after  n  hours  of  visual  exposure  for  ft- week -old  dark-reared  kittens.  Columns:  1. 
dark-reared  kittens.  IV.  normally  reared  kittens  During  8  hours  of  exposure,  conditions 
were:  in  II  and  111.  freely  moving,  in  III.  12  hours  in  the  dark  followed  the  6  hours  of 
exposure  Numbers  of  visual  cells  recorded  are  given  under  each  column.  Specific  cells 
(cross-hatched)  are  activated  by  oriented  stimuli  within  a  sharp  angle  (<«)°).  Immature  cells 
(diagonal  stripes)  are  activated  by  oriented  stimuli  within  a  larger  angle  (<150°)  Nonspecific 
cells  (open)  are  activated  by  nononented  stimuli  moving  in  any  direction.  A  statistical  analysis 
reveals  no  significant  difference  in  the  percentage  of  immature  and  specific  units  between 
columns  III  and  IV  Therefore  it  may  be  that  for  a  b-week-old  dark-reared  kitten,  a  6-hour 
exposure  to  visual  input  followed  bv  12  hours  in  the  dark  is  sufficient  to  produce  a  distribution  of 
cortical  cells  similar  to  that  of  normally  reared  animals.  (From  Buisscret  ct  al.  (1978).) 

information  is  part  of  the  animal's  experience,  the  specificity  of  the 
response  of  cortical  neurons  varies  widely.  Specificity  increases  with 
normal  patterned  experience.  Deprived  of  normal  patterned  information 
(dark-reared  or  lid-sutured  at  birth,  for  example)  specificity  decreases. 
Further,  even  a  short  exposure  to  patterned  information  after  six  weeks 
of  dark-rearing  can  reverse  the  loss  of  specificity  and  produce  an  almost 
normal  distribution  of  cells. 

We  do  not  claim  and  it  is  not  necessary  that  all  neurons  in  visual  cortex 
he  so  modifiable.  Nor  is  it  necessary  that  modifiable  neurons  are  especi¬ 
ally  important  in  producing  the  architecture  of  visual  cortex.  It  is  our 
hope  that  the  general  form  of  modifiability  we  require  to  construct 
distributed  mappings  manifests  itself  for  at  least  some  cells  of  visual 
cortex  that  are  accessible  to  experiment.  We  thus  make  the  conservative 
assumption  that  biological  mechanisms,  once  established,  will  manifest 
themselves  in  more  or  less  similar  forms  in  dilferent  regions.  If  this  is  the 
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case,  modifiable  individual  neurons  in  visual  cortex  can  provide  evidence 
for  such  modification  more  generally. 

2.2.  Modification  of  cortical  synapses:  global  and  local  variables 

To  apply  the  general  theoretical  ideas  of  the  previous  section  to  visual 
cortex,  we  introduce  the  following  notation.  Consider  a  cortical  cell  as 
shown  in  Fig.  13: 


Fig.  13.  A  model  neuron  which  processes  the  input  </(r)  according  to  the  synaptic  weights 
m(r)  to  yield  the  response  c(r). 

Replacing  equations  (1.1)  and  (1.2)  we  write 

c(r)=  2  '",('><(,(/).  (2.1) 

/ 

where  c(t)  is  the  output  at  time  t,  m,(f)  is  the  efficacy  of  the  yth  synapse  at 
time  t.  d,(t)  is  the  jx\f  component  of  the  input  at  time  t  (the  firing 
frequency  of  the  yth  presynaptic  neuron)  and  S,  denotes  summation  over 
/,  i.e..  over  all  presynaptic  neurons.  We  can  then  write: 

m(t)=  m2(t ) . nivIO) . 

d(t)=(d,(t).d2(t) . d„(t)).  (2.2) 

c(t)=  m(t)d{t). 

>n(t)  and  d(t)  are  real-valued  vectors,  of  the  same  dimension.  N.  i.e..  the 
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number  of  ideal  synapses  onto  the  neuron,  and  c  (r)  is  the  inner  product 
(or  dot  product  )  of  in(t)  and  d(t).  The  vector  of  synaptic  efficacies  at 
time  t.  »i  f t ).  is  called  the  slate  of  the  neuron  at  time  I.  (Note  that  r(r)  as 
well  as  all  components  of  d(t)  represent  firing  frequencies  that  are 
measured  from  the  level  of  average  spontaneous  activity;  thus  they  might 
take  negative  as  well  as  positive  values;  m,( /)  is  dimensionless.) 

We  can  now  formulate  the  question:  What  is  the  local  principle  of 
organization,  by  asking  what  is  the  change  in  time  of  m,(t)  (the  /th 
synapse  onto  the  cortical  cell,  receiving  inputs  d,(t))  and  on  what  variables 
does  this  depend. 

The  various  factors  that  influence  synaptic  modification  may  be  divided 
broadly  into  two  classes — those  dependent  on  global  and  those  dependent 
on  local  information.  Presumably,  global  information  in  the  form  of 
chemical  or  electrical  signalling  influences  most  (or  all)  modifiable  junc¬ 
tions  of  a  given  type  in  a  given  area  in  the  same  way.  Evidence  for  the 
existence  of  global  factors  that  atlect  development  may.  for  instance,  be 
found  in  Kasamatsu  and  Pettigrew  (1976.  1979),  Singer  (1979.  1980)  and 
Buisserec  ec  al.  (1978),  Baer  and  Daniels  (1983)  and 
Bear  et  al.  (1983).  On  the  other  hand,  local  information 
available  at  each  modifiable  synapse  can  influence  each 
■junction  in  a  different  manner. 

An  early  proposal  as  to  how  local  information  could  affect  synaptic 
modification  was  made  by  Hebb  (1949).  His,  now  classical,  principle  was 
suggested  as  a  possible  neurophysiological  basis  for  operant-  conditioning: 
"when  an  axon  of  cell  A  is  near  enough  to  excite  a  cell  B  and  repeatedly 
or  persistently  takes  part  in  firing  it,  some  growth  process  or  metabolic 
change  takes  place  in  one  or  both  cells  such  that  A’s  efficiency,  as  one  of 
the  cells  firing  B,  is  increased."  Thus  the  increase  of  the  synaptic  strength 
connecting  A  to  B  is  dependent  upon  the  correlated  firing  of  A  and  B. 
Such  a  correlation  principle  has  inspired  the  work  of  many  theoreticians 
on  various  topics  related  to  learning,  associative  memory,  pattern  recog¬ 
nition,  organization  of  neural  mappings  (retinotopic  projections)  and 
development  of  selectivity  of  cortical  neurons. 

It  is  fairly  clear  that  in  order  to  actually  use  Hebb's  principle  one  must 
state  conditions  for  synaptic  decrease  as  specific  as  those  for  synaptic 
increase:  if  synapses  are  allowed  only  to  increase,  all  synapses  will 
eventually  saturate;  no  information  will  be  stored  and  no  selectivity  will 
develop  (see  for  example  Scinowski.  (1977a. b)).  What  is  required  is  thus  a 
complementary  statement  to  Hebb's  principle  giving  conditions  for 
synaptic  decrease.  Such  a  statement  is  given  in  what  follows. 

For  a  general  form  of  synaptic  modification,  we  write: 
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lit,  =  F(d, .  . .  tn, ;  <it . .  .  c  . . .  c\  X . . .  Y. . .  Z)  .  (2.3) 

where  the  first  set  of  variables  d, . . .  m,  are  what  we  call  local,  the  second 
set  of  variables  dt  .  .  .  c  what  we  call  quasi-local,  while  the  third  set  what 
we  call  global.  Local  variables  such  as  d, . . .  m,  are  those  directly  at  the 
synaptic  site.  Thus  any  information  would  be  directly  available.  Quasi- 
local  variables  are  those  such  as  dt . . .  c.  c.  These  are  physically  connected 
to  the  synaptic  site  by  the  cell  itself.  However,  in  order  that  the  in¬ 
formation  they  contain  be  available,  some  means  of  internal  cellular 
communication  must  be  assumed.  Note  that  we  include  among  these  such 
variables  as  c  (the  averaged  activity  of  the  cell  over  time).  Global 
variables  are  called  X . . .  Y ...  Z. 

In  work  done  in  the  past  few  years  we  have  explored  a  form  of  synaptic 
modification  that  can  be  written  as  follows.  Referring  to  the  ;th  synaptic 
junction: 

tit,  =  <t>(c.  c)d,  -  rrn, .  (2.4) 

Note  that  as  in  passive  modification,  the  output  of  a  cell  is  determined  by 
the  input  and  the  already  existing  synaptic  strengths  as  well  as  by 
noise-like  fluctuations.  The  precise  form  of  <f>  is  not  critical  as  long  as  it 
has  certain  general  characteristics.  Cooper.  Liberman,  and  Oja  (1979). 
(CLO)  showed  that  if  the  function  d>  goes  through  zero  then  the  sharp¬ 
ness  of  the  tuning  curve  is  altered  by  the  visual  experience  of  the  animal 
in  agreement  with  what  is  observed.  This  modification  might  be  called 
'Hebbian'  when  the  output  is  above  the  modification  threshold.  0M.  and 
’anti-Hebbian'  when  the  output  is  below  this  threshold.  The  function.  d>. 
is  also  assumed  to  have  a  dependence  on  global  variables,  not  explicitly 
written.  CLO  thus  assumed  that  the  modifiability  of  a  synaptic  junction  is 
dependent  on  events  that  jeeur  at  different  parts  of  the  same  cell  and  on 
the  rate  at  which  the  cell  responds.  They  proved  several  theorems  which 
show  that  with  this  form  of  passive  modification  there  is  an  increase  in  the 
specificity  of  the  response  of  a  cortical  cell  to  visual  input  (sharpening  of 
its  tuning  curve)  when  that  cell  is  exposed  to  stimuli  that  are  the  result  of 
normal  patterned  visual  experience  and  a  loss  of  specificity  when  that  cell 
is  exposed  to  noise-like  input,  such  as  might  be  expected  when  an  animal 
is  dark-reared  or  raised  with  eyelids  sutured.  Specificity  can  be  regained, 
however,  with  a  return  of  input  due  to  patterned  vision. 

In  addition  to  this  basic  behavior,  simulations  and  mathematical  results 
on  the  asymptotic  states  of  the  neural  network  show  some  more  subtle 
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phenomena  that  depend  upon  values  of  system  parameters.  Of  note  are 
the  rate  of  decay  (forgetting  per  unit  time),  the  strength  of  selective 
modification  of  synaptic  junctions,  the  interaction  of  modifiable  with 
non-modifiable  synapses,  and  the  different  statistical  properties  of  noise 
factors. 

The  reason  for  the  increase  of  selectivity  is  the  crossover  of  the  0 
function  from  the  negative  to  the  positive  region  at  the  modification 
threshold  0M-  This  was- recognized -by  CLO  to  be  associated  with  some 
property  of  the  cell,  possibly  the  average  firing  rate.  This  idea  was 
enlarged  and  extended  by  Bienenstock,  Cooper  and  Munro 
(1982)  (BCM)  and  applied  to  a  great  variety  of  situations 
in  visual  cortex.  The  essential  idea  of  BCM  was 
to  allow  to  vary  non-linearly  with  the  average  activity  of 

the  cell,  c.  Doing  this  they  achieved  a  variety  of  desirable  properties  as 
well  as  a  theoretical  structure  in  excellent  agreement  with  available 
experimental  data.  The  crucial  point  in  the  choice  of  the  function  0  (c,  c) 
is  the  determination  of  the  threshold  0M( f),  i.e..  the  value  of  c  at  which 
0(c.  c)  changes  sign.  A  candidate  for  0\t(t)  is  the  average  value  of  the 
postsynaptic  firing  rate,  c(t).  The  time  average  is  meant  to  be  taken  over 
a  period  T  preceding  t  much  longer  than  the  membrane  time-constant  r 
so  that  c(t)  evolves  on  a  much  slower  timescale  than  c(f).  This  can  usually 
be  approximated  by  averaging  over  the  distribution  of  inputs  for  a  given 
state  m(t) 

c(t)=  m(t)d.  (2.5) 

This  results  in  an  essential  feature,  the  instability  of  low  selectivity  points. 

(This  can  be  most  easily  seen  at  zero  selectivity  equilibrium  points,  where, 
with  any  perturbation,  the  state  is  driven  away  from  this  equilibrium, 
whatever  the  input.) 

Therefore,  if  stable  equilibrium  points  exist  in  the  state  space,  they  are 
of  high  selectivity.  However,  do  such  points  exist  at  all?  The  answer  is 
generally  yes  provided  that  the  state  is  bounded  from  the  origin  and  from 
infinity.  These  conditions,  instability  of  low-selectivity  equilibria  as  well  as 
boundedness,  are  fulfilled  by  a  single  function  0(c,  c)  if  we  define  0M{t)  to 
behave  as  a  nonlinear  function  of  c(t),  for  example,  a  power.  The 
exponent  should  then  be  larger  than  I.  The  final  requirement  on  0  (c.  c) 
thus  reads: 


sign  0(c.  c)  =  sign 


for  c  >  0  . 


0(0,  c)  -  0  for  all  c , 


(2.6) 
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where  c„  and  p  are  two  fixed  positive  constants.  The  threshold  0m{c)  = 
(c/cuYc  thus  serves  two  purposes:  allowing  threshold  modifications  when 
c  =  c»  as  well  as  driving  the  state  from  regions  such  that  c  <S  Cn  or  c  >  c0. 
The  process  of  synaptic  growth,  starting  near  0  to  eventually  end  in  a 
stable  selective  state,  may  be  described  as  follows.  Initially,  c  c0  hence 
c)>0  for  all  inputs  in  the  environment:  the  responses  to  all  inputs 
grow.  With  this  growth  c  increases,  thus  increasing  8M.  Now  some  inputs 
result  in  postsynaptic  responses  that  exceed  0M.  while  others — those 
whose  direction  is  far  away  (close  to  orthogonal)  from  the  favored 
inputs— give  a  response  less  than  0M.  The  response  to  the  former  con¬ 
tinues  to  grow  while  the  response  to  the  latter  decays.  This  results  in  a 
form  of  competition  between  incoming  patterns  rather  than  competition 
between  synapses.  The  response  to  unfavored  patterns  decays  until  it 
reaches  0,  where  it  stabilizes,  for  <#>(0.  c)  =  0  for  any  c.  The  response  to 
favored  patterns  grows  until  the  mean  response  c  is  high  enough,  and  the 
state  stabilizes.  This  occurs  in  spite  of  the  fact  that  many  complicated 
geometrical  relationships  may  exist  between  different  patterns,  i.e.,  that 
they  are  not  orthogonal  since  different  patterns  may  and  certainly  do 
share  common  synapses. 

Any  function,  </>,  that  satisfies  (2.6)  will  give  these  qualitative  results. 
The  precise  form  of  this  function  (e.g.,  the  numerical  values  of  p  and  c0) 
will  affect  the  detailed  behavior  of  the  system  such  as  rate  of  convergence, 
height  of  the  maximum  for  a  selective  cell  as  well  as  a  variety  of  other 
more  subtle  effects.  We  are  presently  investigating  the  con  sequences  of 
various  detailed  assumptions  concerning  the  form  of  <t>(c.  c)  and  compar¬ 
ing  these  with  existing  and  proposed  experiments.  In  doing  this  we  hope 
to  arrive  at  a  detailed  understanding  of  the  form  of  the  function  that 
controls  synaptic  modification. 

We  note  also  that  with  this  form  of  modification,  the  control  of  9m  by  a 
global  signal  (in  addition  to  c)  could  produce  the  following  results:  If  0M 
is  set  to  be  very  large  the  cell's  response  would  diminish.  This  will  result 
in  a  behavior  that  is  like  that  described  by  Eric  Kandel  in  Aplysia 
habituation  experiments.  If  0m  is  set  very  low  the  cell  will  rapidly  increase 
its  response  to  a  stimulus.  This  could  be  related  to  a  type  of  sensitization 
in  which  the  sensitizing  signal  has  the  effect  of  resetting  0m  to  a  very  low 
level.  For  a  variable  0m  as  will  be  shown  below  applied  to  visual  cortex, 
one  gets  increasing  and  decreasing  of  selectivity  such  as  those  seen  in 
experimental  results  over  the  last  generation.  We  thus  have  the  possibility 
that  a  single  mechanism  of  modification,  functioning  in  slightly  different 
ways  can  account  for  a  variety  of  experimental  data  in  both  invertebrates 
and  vertebrates. 
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values  in  the  space  of  inputs  to  the  neuron  .V.  The  variable  d  represents  a 
random  input  to  the  neuron,  and  is  characterized  by  its  probability 
distribution  that  may  be  discrete  or  continuous.  {During  normal 
development,  the  input  to  the  neuron  Jor  neural  network)  is  presumably 
distributed  uniformly  over  all  orientations.  In  abnormal  rearing  con¬ 
ditions  je.g.,  dark  reared)  the  input  during  development  could  be  different 
from  the  input  for  measuring  selectivity.  How  this  should  be  translated  in 
the  formal  space  RN  will  be  discussed  later.)  This  distribution  defines  an 
environment,  mathematically  a  random  variable  d.  Selectivity  is  estimated 
(before,  or  after  development)  with  respect  to  this  same  environment. 
Obviously.  SeljfvV)  always  falls  between  0  and  1.  and  the  higher  selec¬ 
tivity  of  jV  in  d,  the  closer  Selj(^)  is  to  1. 

We  analyze  the  behavior  of  (2.4)  for  r  =  0.  The  behavior  depends 
critically  on  the  environment,  that  is.  on  the  distribution  of  the  stationary 
stochastic  process,  d.  Two  classes  of  distributions  may  be  considered: 

(a)  Discrete  distributions  (K  possible  inputs  </*)•.  These  are 

generally  assumed  to  occur  with  the  same  probability  UK.  The  process  d 
is  then  a  jump  process  which  randomly  assumes  new  values  at  each  time 
increment.  The  vector  m  is  (roughly)  a  Markov  process. 

(b)  Continuous  distributions:  in  work  of  BCM,  the  only  continuous 
distribution  that  is  considered  is  a  uniform  distribution  d  over  a  closed 
I -parameter  curve  in  the  input  space  RN.  Although  the  principles  under¬ 
lying  the  convergence  to  selective  states  arc  intuitively  fairly  simple, 
mathematical  analysis  of  the  system  is  not  entirely  straightforward,  even 
for  the  simplest  cl.  Mathematical  results,  obtained  only  for  certain  discrete 
distributions,  are  of  two  types:  (1)  equilibrium  points  are  locally  stable  if 
and  only  if  they  are  of  highest  available  selectivity  with  respect  to  the 
given  distribution  of  d.  (2)  given  any  initial  value  of  m  in  the  state  space, 
the  probability  that  nt(t)  converges  to  one  of  the  maximum  selectivity 
fixed  points  as  t  goes  to  infinity  is  1.  Results  of  the  second  type  arc  much 
stronger,  and  require  a  tedious  geometrical  analysis.  Results  are  stated 
here  in  a  somewhat  simplified  form.  For  exact  statements  and  proofs,  the 
reader  is  referred  to  Bienenstock  (1980)  or  to  BCM  (1982).  To  illustrate, 
we  study  the  simple  case  where  d  takes  on  values  on  only  two  possible 
input  vectors  d1  and  dz.  that  occur  with  the  same  probability  and  let  r  =  <1 
in  (2.4): 

P[cl  =  d']  =  P\d  -  </:)  =  1/2. 

Whatever  the  real  dimension  N  of  the  system  it  reduces  to  two  dimen- 
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(c»c0)  <£(c) 


(c<<c0)  <pU) 


Kip  14  A  function  satisfying  condition  (2.b).  The  three  diagrams  show  ihe  behavior  of 
<A(c,  t)  as  a  function  of  c  for  three  different  constant  values  of  c.  In  each  diagram,  the  i olid 
pun  of  the  curve  represents  di(c.c)  in  the  vicinity  of  r.  which  is  the  relevant  part  of  this 
function 


2.3.  Some  mathematical  results 
Selectivity 

It  is  common  usage  to  estimate  the  orientation  selectivity  of  a  single 
visual  cortical  neuron  by  measuring  the  half-width  and  half-height — or  an 
equivalent  quantity — of  its  orientation  tuning  curve.  The  selectivity  is 
then  measured  with  respect  to  a  parameter  of  the  stimulation,  namely  the 
orientation,  which  takes  en  values  over  an  interval  of  180°.  In  our  work, 
various  kinds  of  inputs  are  considered,  c.g.,  formal  inputs  with  a  parameter 
taking  values  on  a  finite  set  of  points,  rather  than  a  continuous  interval.  It 
will  then  be  useful  to  have  a  convenient  general  index  of  selectivity,  defined 
in  all  cases.  We  propose  the  following: 


Scl,/(.V )  =  I  - 


mean  response  of  .V  with  respect  to  il 
maximum  response  of with  respect  to  d 


With  this  definition,  selectivity  is  estimated  with  respect  to.  or  in  an 
environment  for  the  neuron,  that  is.  a  random  variable  <1  that  takes  on 
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sions.  (Any  component  of  m  outside  the  linear  subspace  spanned  by  d‘ 
and  d:  will  eventually  decay  to  0  due  to  the  uniform  decay  term.) 


Analytic  results  in  two  dimensions 

It  follows  immediately  from  the  definition  that  the  maximum  value  of 
Selj(m  )  in  the  state  space  is  1/2.  It  is  reached  for  states  m  which  give  null 
response  when  <f 1  comes  in  (i.e..  are  orthogonal  to  </')  but  positive 
response  for  d: — or  vice  versa.  Minimum  selectivity,  namely  0.  is 
obtained  for  states  rn  such  that  rn  ■  d'  =  m  ■  d:.  Equilibrium  states  of  both 
kinds  indeed  exist. 

Lemma  1  Let  dl  and  d:  he  linearly  independent  and  d  satisfy  P[d  =  r/'|  = 
P\d  =  (f:|  =  1/2.  Then  for  any  <i>  satisfying  (2.6)  the  system  (2.4)  admits 
exactly  4  fixed  points.  m",  m nr.  and  m  with :  Sclj(m")  =  Selj(nt 1 :)  =  0. 
and  Sclj(m')  =  Selj(nr)  =  1/2.  ( Here  the  superscripts  indicate  which  of  the 
d  are  not  orthogonal  to  m.  |m"  is  the  origin. |  Thus  for  instance  m  1  ■  d'  >0, 
m 1  ■  d:  =  0.) 

The  behavior  of  the  system  depends  on  the  geometry  of  the  inputs,  in 
the  present  case  on  cos (<('.  d:).  The  crucial  assumption  that  is  needed 
here  is  that  cos(<f '.</•’)>  0.  This  is  a  reasonable  assumption  which  is 
obviously  satisfied  if  all  components  of  the  inputs  are  positive,  as  is 
assumed  in  some  models  (Von  der  Malsburg  (1973);  Perez  et  al.  (1975)). 
We  may  then  state  the  following: 

Theorem  1.  Assume  thrit  in  addition  to  the  conditions  of  Lemma  1. 
cos (</',  J;)  > 0.  Then  in"  and  in'  -  are  unstable,  m'  and  m:  are  stable,  and 
whatever  its  initial  value,  the  state  of  the  system  converges  almost  surely 
(i.e..  with  probability  1)  either  to  in'  or  to  nr. 

Theorem  1  is  the  basic  result  in  the  2-dimensional  setting:  it  charac¬ 
terizes  evolution  schemes  based  on  competition  between  patterns,  saying 
that  the  state  eventually  reaches  maximal  selectivity  even  when  the  two 
input  vectors  arc  very  close  to  one  another.  Obviously  this  requires  that 
some  of  the  synaptic  strengths  be  negative  since  the  neuron  has  linear 
integrative  power.  Inhibitory  connections  arc  thus  necessary  to  obtain 
selectivity.  Some  selectivity  is  also  realizable  with  no  inhibitory  con¬ 
nections— not  even  ‘intracorneal'  ones —  if  the  integrative  power  is  ap¬ 
propriately  nonlinear.  However,  whatever  the  nonlinearity  of  the  in- 
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lucrative  power.  Theorem  1  could  not  hold  for  evolution  equations  based 
on  competition  between  converging  afferents. 

In  Theorem  I.  we  have  a  discrete  sensory  environment  which  consists 
of  exactly  two  different  stimuli— a  situation,  although  simple  mathema¬ 
tically,  not  often  encountered  in  nature.  It  may.  however,  very  well 
correspond  to  a  visual  environment  restricted  to  only  horizontally  and 
vertically  oriented  contours,  present  with  equal  probability.  Theorem  1 
then  predicts  that  cortical  cells  will  develop  a  selective  response  to  one  of 
the  two  orientations,  with  no  preference  for  either  (other  than  what  may 
result  from  initial  connectivity).  Thus,  on  a  large  sample  of  cortical  cells, 
one  should  expect  as  many  cells  tuned  to  the  horizontal  orientation  as  to 
the  vertical  one.  So  far.  no  assumption  is  made  on  intracortical  circuitry. 
We  discuss  this  later. 

The  proof  of  Theorem  1  is  based  on  the  existence  of  trap  regions 
around  each  of  the  selective  fixed  points: 

Theorem  2.  Under  the  same  conditions  as  in  Theorem  1,  there  exists 
around  m‘(m:)  a  region  F'(F'),  such  that  once  the  stale  enters  F'(F:).  it 
converges  almost  surely  to  m‘(m:). 

The  meaning  of  Theorem  2  is  the  following:  once  m (r )  has  reached  a 
certain  selectivity,  it  cannot  ‘switch'  to  another  selective  region.  Applied 
to  cortical  cells  in  a  patterned  visual  environment,  this  means  that  once 
they  become  sufficiently  committed  to  certain  orientations,  they  wilt 
remain  committed  to  those  orientations  (provided  that  the  visual 
environment  does  not  change),  becoming  more  selective  as  they  stabilize 
to  some  maximal  selectivity.  Theorems  1  and  2  are  illustrated  in  Fig.  15. 

It  is  worth  mentioning  that  when  cos(rf‘.  d2)  <  0.  the  situation  is  much 
more  complicated:  trap  regions  don't  necessarily  exist  and  periodic 
asymptotic  behavior,  i.c..  limit  cycles,  may  occur,  bifurcating  from  the 
stable  fixed  points  when  cos((/‘.  <F)  becomes  too  negative  (see  Bienen- 
stock  (1980)). 

/  Uglier  dimensions 

We  now  turn  to  the  case  where  d  takes  on  K  values.  The  following  is 
easily  obtained: 

Lemma  2.  Let  d'.  d ‘ . dK  be  linearly  independent  and  d  satisfy  P\ d  = 

(/'|  -  •  •  •  =  P\d  -  dK\  =  UK.  Then,  for  any  <t>  satisfying  (2.6).  (2.4)  admits 

exactly  2*  fixed  points  with  sclecti vines  0.  l/K.  21 K . (K  -  I  )/K.  There 

are  K  fixed  points  m ' . mK  of  selectivity  ( K  -  1  )/K. 
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Fig.  15  The  phase  portrait  of  equation  (2.4)  subject  to  condition  (2  b).  The  diagram  shows 
the  traiectorics  of  the  state  of  the  neuron  starling  from  different  initial  points  The  linal  state 
of  the  svstem  (m1  or  nr)  is  determined  when  the  trajectory  enters  the  corresponding  'trap 
(shaded)  region  (/•''  or  F: ). 


Obviously.  ( K  -  \)IK  is  also  the  maximum  possible  selectivity  with 
respect  to  </.  It  means  a  positive  response  for  one  and  only  one  of  the 
inputs.  The  situation  is  now  much  more  complicated  than  what  it  was  with 
only  2  inputs:  it  is  not  obvious  whether  in  all  eases  assuming  that  all  the 
cosines  between  inputs  are  positive  is  sufficient  to  yield  stability  of  the 
maximum  selectivity  fixed  points.  However,  we  may  state  the  following: 

Theorem  3.  Assume,  in  addition  to  the  conditions  of  l.emma  2.  that 

d' . dK  arc  all  mutually  orthogonal  or  close  to  orthogonal.  Then  the  K 

fixed  points  of  maximum  selectivity  are  stable,  and.  whatever  its  initial 
value,  the  state  of  the  svstem  converges  almost  sttrelv  to  one  of  them. 


38. 


L.N  Cooper  /  Neuron  Learning  to  Network  Organisation 

The  proof  of  Theorem  3  also  involves  Irap  regions  around  the  K 
maximally  selective  fixed  points,  and  the  analog  of  Theorem  2  is  true 
here. 

Although  the  general  case  has  not  yet  been  solved  analytically,  as  will 
be  seen  later,  computer  simulations  suggest  that  for  a  fairly  broad  range 

of  environments  if  d' ■  d1  >  0.  even  if  d‘ . dK  are  far  from  being 

mutually  orthogonal,  the  K  fixed  points  of  maximum  selectivity  are 
stable. 

Simulations  suggest  further  that  even  if  the  d' . dA  are  not  linearlv 

independent  and  are  very  far  from  being  mutually  orthogonal,  the 
asymptotic  selectivity  is  close  to  its  maximum  value  with  respect  to  d. 

Analytic  results  in  two  dimensions  and  computer  simulations  in  higher 
dimensions  indicate  that  the  form  of  synaptic  modification  described  here 
leads,  in  general,  to  the  evolution  of  maximum  selectivity  with  respect  to 
the  environment.  We  are  trying  to  extend  the  linear 
analysis  of  stability  performed  in  two  dimensions  to 
higher  dimensions.  Stabilitv  analvsis  has  been  attemnted 
on  systems  of  K  dimensions  for  general  linearlv 
independent  environment  (Cooper  et  al.  (1982).  The 
same  arguments  that  lead  to  statements  of  stability  in  two  dimensions 
apply  in  this  general  ease.  However,  the  technical  difficulty  increases.  The 
problem  may  he  stated  in  terms  of  a  K th  order  eigenvalue  equation. 
Local  stabilitv  for  an  s  =  1  fixed  point  will  be  assured  if  the  eigenvalues  of 
the  matrix  of  coefficients  of  the  K  differential  equations  are  negative. 
Similarly,  the  instability  of  points  for  which  s  '  1  would  be  characterized 
bv  the  presence  of  positive  eigenvalues.  Since  this  matrix  of  coefficients 
exhibits  some  symmetry,  there  is  hope  that  the  problem  could  be  solved 
analytically  (for  reasonable  size  K.  the  svstem  of  equations  could  he 
solved  numerically  for  special  cases).  This  kind  of  analytic  statement 
would  confirm  that  the  states  of  high  selectivity  observed  in  computer 
simulations  are  indeed  stable  asymptotic  states. 

I  he  monocular  problem  :  A  simple  circular  environment 

We  now  apply  this  theory  to  the  problem  of  orientation  selectivity  and 
binocular  interaction  in  primary  visual  cortex.  The  ordinary  development 
of  these  properties  in  mammals  depends  to  a  large  extent  on  normal 
functioning  of  the  visual  system  (i.e..  normal  visual  experience)  during  the 
first  few  weeks  or  months  of  postnatal  life.  This  has  been  demonstrated 
many  times  by  various  experiments,  based  mainly  on  the  paradigm  of 
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roaring  the  animal  in  a  restricted  sensory  environment.  We  show  that  the 
theory  described  above  can  account  for  both  normal  development  and 
development  in  restricted  visual  environments. 

Consider  first  a  classical  test  environment  used  to  construct  the  tuning 
curve  of  cortical  neurons.  This  environment  consists  of  an  elongated  light 
bar  successively  presented  or  moved  in  all  orientations — in  a  random 
sequence — in  the  neuron's  receptive  field.  Thus  all  the  parameters  of  the 
stimulus  are  constant  except  one.  the  orientation,  which  is  uniformly 
distributed  on  a  circularly  symmetric  closed  path.  We  assume  that  the 
retino-cortical  pathways  map  this  family  of  stimuli  to  the  cortical  neuron’s 
space  of  inputs  in  such  a  way  as  to  preserve  the  circular  symmetry  (as 
defined  below).  Thus,  the  typical  theoretical  environment  that  will  be 
used  for  constructing  the  neuron's  tuning  curve  is  a  random  variable  d 
uniformly  distributed  on  a  circularly  symmetric  closed  one-parameter 
family  of  points  in  the  space  Rs.  The  parameter  coding  orientation  in  the 
receptive  field  is,  in  principle,  continuous.  However,  for  the  purpose  of 
numerical  simulations,  the  distribution  is  made  discrete.  Thus,  d  takes  on 
values  on  the  points  d, . dK. 

The  requirement  of  circular  symmetry  is  expressed  mathematically  as 

follows:  the  matrix  of  inner  products  of  the  vectors  d' . dK  is  circular 

(i.e..  each  row  is  obtained  from  its  nearest  upper  neighbor  by  shifting  it 
one  column  to  the  right)  and  the  rows  of  the  matrix  are  unimodal.  A 
random  variable,  </,  uniformly  distributed  on  such  a  set  of  points  will  be. 
hereafter,  called  a  circular  environment.  Such  a  d  may  be  roughly  charac¬ 
terized  by  3  parameters:  A’.  K  and  a  measure  of  the  mutual  geometrical 
closeness  of  the  i/'s,  for  instance  the  minimum  value  of  cos (</l.</‘)  over 
the  environment. 

We  are  now  faced  with  the  difficult  problem  of  specifying  the  stationary 
stochastic  process  that  represents  the  time-sequence  of  inputs  to  the 
neuron  during  development.  To  begin,  we  simplify  the  problem  by  giving 
the  stochastic  process  exactly  the  same  distribution  as  the  circular  d 
defined  above.  In  doing  so,  we  assume  that  development  of  orientation 
selectivity  is  to  a  large  extent  independent  of  other  parameters  of  the 
stimulus,  e.g..  contrast,  shape,  position  in  the  receptive  field,  retinal 
disparity  for  binocular  neurons,  etc.  The  elementary  stimulus  for  a  cortical 
neuron  is  a  rectilinear  contrast  edge  or  bar.  Any  additional  pattern  present 
at  the  same  time  in  the  receptive  field  is  regarded  as  random  noise.  (A 
discussion  of  this  point  is  given  in  Cooper  et  al.  (1979)). 

Simulations  show  the  following  behavior: 

(1)  Hie  state  converges  rapdilv  to  a  fixed  point,  or  attractor. 
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(2)  Various  such  attractors  exist.  For  a  given  d  and  d>  they  all  have  the 
same  selectivity,  which  is  close  to  its  maximum  value  in  d. 

(3)  The  asymptotic  tuning  curve  is  always  unimodal.  One  may  thus  talk 
of  the  preferred  orientation  of  an  attractor. 

(4)  There  exists  an  attractor  in  each  possible  orientation. 

(5)  If  there  is  no  initial  preference,  all  orientations  have  equal  prob¬ 
ability  of  attracting  the  state.  (Which  one  will  become  favored  depends  on 
the  exact  sequence  of  inputs).  This  does  not  hold  for  environments  which 
are  not  perfectly  circular,  at  least  for  a  single  neuron  system  as  the  one 
studied  here. 

The  system  thus  behaves  exactly  as  expected  from  the  results  of  the 
preceding  section. 


The  binocular  problem  :  a  more  complex  environment 

We  now  consider  a  binocularly  driven  cell.  The  firing  rate  of  the  neuron 
at  time  t  is  now  given  by 

c(r)=  •</((/)+  m,(t)d.(t).  (2.8) 

with  evolution  schemes  for  left'  and  ‘right'  states  m,  and  m,  straightfor¬ 
ward  generalizations  of  (2.4).  We  have  partitioned  the  input  Vector  space 
into  a  left  space  and  a  right  space;  hence  m  goes  to  (m,.  m,)  and  d 
becomes  id,.  </,).  Since  </,  and  d,  can  be  independent,  the  topology  of  the 
environment  is  potentially  more  complex. 

Various  possibilities  exist  for  the  input  (</,,  d,):  one  may  wish  to 
consider  normal  rearing  tboih  </,  and  d,  circular  and  presumably  highly 
correlated),  monocular  deprivation,  binocular  deprivation,  and  so  on.  The 
vector  (<//,  dr)  is  a  stationary  stochastic  process,  whose  distribution  is  one 
of  the  following,  depending  on  the  experimental  situation  one  wishes  to 
reproduce; 

Normal  Rearing  (NR); 

di(t)  =  d,(t)  for  all  l.  and  d,  is  circular.  (Noise  terms  that  mav  be  added 
to  the  inputs  may  or  may  not  be  stochastically  independent.) 

Uncorrelated  Rearing  (UR); 

di  and  d,  are  i.i.d.  (independent  identically  distributed):  they  have  the 
same  circular  distribution,  but  no  statistical  relationship  exists  between 
them. 

Binocular  Deprivation  (BD): 

The  2(V  components  of  ( dt.d ,)  are  i.i.d.:  d,  and  d,  are  uncorrelated 
noise  terms. 
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Monocular  Deprivation  (MD): 

tli  is  circular,  ti,  is  a  noise  term:  <1,  =  n. 

In  the  NR  case,  the  inputs  from  the  two  eyes  to  a  binocular  cell  are 
probably  well  correlated.  We  therefore  assume  that  they  are  equal,  which 
is  mathematically  equivalent.  The  BD  distribution  represents  dark  dis¬ 
charge. 

Uncorrelated  or  strabismic  rearing  (UR)  involves  presenting  fully  two 
independent  circular  environments  (a  ‘toroidal’  environment).  The  final 
state  can  be  either  monocular  and  specific  or  binocular  and  specific  with 
no  correlation  between  the  patterns  perferrcd  by  the  two  eyes. 

The  results  of  binocular  deprivation  or  (correlated)  normal  rearing  are 
just  those  of  the  monocular  case.  We  assume  that  binocular  stimuli 
presented  during  NR  are  exactly  correlated  so  that  each  pattern  incident 
to  the  left-eye  synapses  is  consistently  accompanied  by  a  corresponding 
pattern  to  the  right-eye  synapses.  Since  the  left  and  right  components  of 
each  pair  are  identical,  the  cell  tunes  to  the  same  pattern  in  each  eye. 
Binocularlv  deprived  input  environments  consisted  of  stimulus  com¬ 
ponents  uniformly  distributed  over  some  range  with  zero  mean.  In  this 
case  (BD).  the  average  response  of  the  cell  is  null  and  so  d>  is  always 
non-negative,  resulting  in  random  fluctuations  of  the  synaptic  state. 

The  development  of  a  neuron  receiving  patterned  input  from  only  one 
eye  (and  uniform  noise  from  the  other)  is  somewhat  surprising.  The 
response  curve  goes  to  maximum  selectivity  with  respect  to  the  open  eye. 
but.  consistent  with  observation,  the  response  to  the  closed  eye  does  not 
fluctuate  randomly.  Rather  the  neuron  becomes  nonresponsive  to  inputs  to 
the  deprived  eye.  Asymptotic  convergence  to  this  state  is  assured  regardless 
of  the  initial  state.  The  theoretical  implications  for  the  reverse  suture  (RS> 
paradigm  are  straightforward:  A  monocularly  deprived  neuron,  having 
reached  a  monocular  selective  state  is  driven  to  another  monocular  selective 
state  preferring  the  newly  opened  eye  upon  reversal  of  suture. 

This  behavior  relies  upon  some  activity,  albeit  purely  random,  to  be 
present  in  the  atTcrents  from  the  closed  eye.  Such  noise  may  be  due  to 
diffuse  light  through  the  eyelid  or  spontaneous  firing  of  LGN  and/or 
retinal  neurons.  As  a  neuron  becomes  selective  with  respect  to  the  open 
eye.  patterns  which  are  preferred  give  a  response  near  threshold 
whereas  the  other  patterns  give  a  much  lower  response.  In  either  case  </> 
is  near  zero.  Noise  accompanying  a  preferred  pattern  drives  the  neuron 
from  the  modification  threshold,  so  the  deprived  synapses  grow  stronger. 
However,  the  opposite  effect  weakens  the  synapses  when  non-preferred 

patterns  are  presented.  A  mathematical  demonstration  of 
this  argument,  given  in  Appendix  C  of  Bienenstock 
et  al.  (1982),  is  presented  in  2.4. 
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2.4.  Comparison  of  theory  with  classical  exfierimental  results 

The  simulated  behavior  of  neurons  in  visual  cortex  with  binocular 
connectivity  is  illustrated  in  Fig.  lb.  The  seemingly  inconsistent  experi¬ 
mental  results  (MD  vs.  BD)  are  faithfully  reproduced  by  computer 
simulation.  Each  of  these  paradigms  was  tested  in  both  deterministic  and 
stochastic  simulation  algorithms  over  several  pattern  sets.  The  model 
withstood  considerable  noisy  input;  indeed  successful  simulation  of  some 
paradigms  (RS  in  particular)  required  that  a  noisclike  component  ac¬ 
company  the  ’pure'  inputs. 

Simulations  of  the  behavior  of  the  system  in  these  different  environ¬ 
ments  give  the  following; 

NR:  all  asymptotic  states  are  selective,  binocular  and  have  matching 
preferred  orientations  for  stimulation  through  each  eye. 

BD:  the  motion  of  the  state  (m,.  m,)  resembles  a  random  walk.  (The 
small  exponential  decay  term  is  necessary  here  in  order  to  prevent  large 
fluctuations.)  The  two  tuning  curves  therefore  undergo  random  fluctua¬ 
tions  that  arc  essentially  determined  by  the  second-order  statistics  of  the 
input  d.  As  can  be  seen  from  the  figure,  these  fluctuations  may  sometimes 
result  in  a  weak  orientation  preference  or  unbalanced  ocular  dominance. 
However,  the  system  never  stays  in  such  states  very  long;  its  average  state 
on  the  long  run  is  perfectly  binocular  and  nonoriented.  Moreover,  what¬ 
ever  the  second-order  statistics  of  d  and  the  circular  environment  in 
which  tuning  curves  arc  assessed,  a  regular  unimodal  orientation  tuning 
curve  is  rarely  observed,  and  selectivity  never  exceeds  0.6.  VVc  may  thus 
conclude  that  orientation  selectivity  as  observed  in  the  NR  case  (both 
experimental  and  theoretical)  cannot  be  obtained  from  purely  random 
synaptic  weights.  It  is  wortf^mentioning  here  that  prolonged  dark  rearing 
has  been  reported  to  increase  response  variability  (Leventhal  and  Hirsch 
(1980));  a  similar  observation  was  made  by  Frcgnac  and  Bienenstock 
(1981). 

MD  and  RS:  The  only  stable  equilibrium  points  are  monocular  and 
selective.  The  system  converges  to  such  states  whatever  the  initial  con¬ 
ditions.  In  particular,  this  accounts  for  reverse  suture  experiments 
(Blakemore  and  Van  Sluytcrs  (1974);  Movshon  (1976)). 

UR:  In  contrast  to  NR,  monocular  as  well  as  binocular  equilibria  exist. 
The  asymptotic  state  generally  observed  with  w((0)  =  m,(0)  =  0  is  mono¬ 
cular.  (This  should  be  attributed  to  the  mismatched  inputs  from  the  two 
eyes,  as  is  done  by  most  authors.)  Asymptotic  states  arc  selective,  and 
when  they  are  binocular,  preferred  orientations  through  each  eye  do  not 
necessarily  coincide.  It  should  be  mentioned  here  that  Blakemore  and 
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f-'jg.  16.  Computer  simulations  of  various  rearing  conditions.  Initial  (dashed)  and  final  (solid) 
responses  to  the  two  eves  are  shown  separately  (left/right). 
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Van  Sluyters  (1974)  report  that  after  a  period  of  alternating  monocular 
occlusion,  the  remaining  binocular  cells  may  differ  in  their  preferred 
orientations  for  stimulation  through  each  eye. 

Selectivity  and  ocular  dominance 

As  an  example  of  the  kind  of  new  and  subtle  effects  that  are  contained  in 
this  theory,  we  consider  in  detail  the  sequence  in  which  ocular  dominance 
and  selectivity  develop  in  the  monocularly  deprived  environment. 

According  to  (2.8)  the  firing  rate  of  a  binocularly  driven  neuron  at  time  t 
is  given  by 


c(t)  =  m  (t) • d  (t) • (t) . 

In  a  situation  corresponding  to  monocular  deprivation — patterned  in¬ 
formation  to  one  eye  (right),  noise  to  the  other  (left) — we  can  write  for 
the  environment 

d  =  (</,.  n ) . 


and  for  the  set  of  synaptic  weights 

m  =  ( »h.  m, ) . 

9 

Where  in,  and  m,  are  the  synaptic  weights  from  the  right  and  left  eyes 
respectively. 

In  this  situation  m,  goes  to  one  of  ifs  selective  fixed  points  as  in  the 
monocular  case.  The  only  fixed  point  for  m,  in  the  noise-like  environment 
is  zero;  but  this  is  unstable  in  the  monocular  case.  It  is  instructive  to 
follow  the  behavior  of  m,  in  this  binocular  case. 

Let  (x„  xt)  be  a  small  perturbation  from  equilibrium.  The  motion  at 
point  (m  *  *■  jr„  x,)  is  given  by: 

x,  =  </>("•  \  •  d,  +  jr,  •  d,  +  x,  •  n,  m*  •  d,  +  x,  •  d,)d, ,  (2  Mr ) 

x,  =  <t>(m  *  •  d,  +  x,  •  d,  +  x,  •  n.  m ",  ■  d ,  +  x,  •  d,)n  .  (2.9/) 

where  we  assume  that  the  noise  has  zero  mean. 

We  analyze  separately,  somewhat  informally,  the  behavior  of  the  two 
equations.  The  stability  of  (2.9r)  is  immediate  from  the  stability  of  the 
selective  state  m  *  in  the  circular  environment  d„  To  analvzc  (2.9/)  we 
divide  the  range  of  the  right  eye  input  d,  into  three  classes: 
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(i)  d,  is  such  that  m  *  ■  d,  is  either  far  above  threshold,  0M,  and 
therefore  4>(m*  ■  d„  in*  ■  H,)>  0.  or  far  below  threshold,  0M,  (but  still 
positive)  and  therefore  d>(m  *■  d„  m*  ■  d,)<  0; 

(ii)  d,  is  such  that  in,  •  d,  is  near  threshold,  0Af,  and  therefore  4>(m*r  ■  d„ 
in  *■  d,)~  0; 

(iii)  d,  is  such  that  in  *  ■  d,  -  0  and  again  4>(m  *  ■  d„m*  ■  d,)  -  0. 

For  the  first  class  of  inputs,  the  sign  of  <(>  is  determined  by  dr  alone, 
hence  2.9/  is  the  equation  of  a  random  walk.  To  investigate  the  behavior 
of  2.9/  in  the  two  other  cases,  we  neglect  the  term  x,  and  linearize  </> 
around  the  relevant  one  of  its  two  zeros.  It  is  easy  to  see  that  case  (ii) 
yields 

x,  -  e,(x,  ■  n)n  .  (2.10) 


whereas  in  case  (iii)  one  obtains 

x,  =  -  f;(jr,  •  n)n.  (2.11) 

where  e,  and  t;  are  positive  constants,  measuring  respectively  the  ab¬ 
solute  value  of  the  slope  of  </>  at  the  modification  threshold  and  at  zero  . 

Since  n  is  a  noise-like  term,  its  distribution  is  presumably  symmetric 
with  respect  to  x,  so  that  averaging  (2.10)  and  (2.11)  yields  respectively 

(2.12) 

ii  =  (2.13) 

where  nil  is  the  average  squared  magnitude  of  the  noise  input  to  a  single 
synaptic  junction  from  the  closed  eye. 

We  thus  sec  that  input  vectors  from  the  first  class  move  x,  randomly, 
inputs  from  the  second  class  drive  it  away  from  0.  whereas  inputs  from 
the  third  drive  it  toward  0.  In  the  case  where  the  range  of  d,  is  a  set  of  K 
linearly  independent  vectors  and  in  *  is  of  maximum  selectivity.  (K  -  1  )/K. 
case  (i)  does  not  occur  at  all.  (The  random  contribution  occurs  only 
before  the  synaptic  strengths  from  the  open  eve  have  settled  to  one  of 
their  fixed  points.)  Case  (ii)  occurs  only  for  one  input,  say  dl,  with  m  ■  i/j 
exactly  equal  to  threshold,  0M,  and  (iii)  occurs  for  the  other  K  -  I  vectors 
which  are  all  orthogonal  to  in*.  In  the  general  case  id,  any  circular 
environment),  the  more  selective  in'  with  respect  to  d„  the  higher  the 
proportion  of  inputs  belonging  to  class  (iii).  the  class  that  yields  (2.13)  i.e., 
that  brings  x,  back  to  0. 
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The  stability  of  the  global  system  still  depends  on  the  ratio  of  the 
quantities  e ,  and  s2  as  well  as  on  the  statistics  of  the  noise  term  n  (e.g.  its 
mean  square  norm).  We  may  however  formulate  two  general  conclusions. 
First,  under  reasonable  assumptions  (e,  of  the  order  of  e2  and  the  mean 
square  norm  of  n  of  the  same  order  as  that  of  d,)  x,  =  0  is  stable  on  the 
average  for  a  selective  m*.  Second,  the  residual  fluctuation  of  x,  around 
0,  essentially  due  to  inputs  d,  in  classes  (i)  and  (ii).  is  smaller  for  highly 
selective  m  *’s  than  it  is  for  mildly  selective  ones. 

Thus,  one  should  expect  that  in  a  monocularly  deprived  environment 
nonselective  neurons  tend  to  remain  binocularly  driven.  In  addition  since 
it  is  the  non-preferred  inputs  from  the  open  eye  accompanied  by  noise 
from  the  closed  eye  (case  three)  that  drive  the  response  to  the  closed  eye 
to  zero,  if  inputs  to  the  open  eye  were  restricted  to  preferred  inputs  (case 
two)  even  a  selective  cell  would  remain  less  monocular. 


Final 


I'ii*  17.  Progression  ol  development  of  selectivity  and  oculiir  dominance.  Note  that  selec¬ 
tivity  develops  for  the  open  eve  before  the  response  to  the  closed  eve  is  driven  to  zero 
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To  better  confront  these  ideas  with  experiment,  the 
single  (BCM)  neuron  must  be  placed  in  a  network  with 
the  anatomical  features  of  visual  cortex,  a  network  in 
which  inhibitory  and  excitatory  cells  receive  input 
from  LGN  and  from  each  other.  This  has  been  done 
(Scofield  and  Cooper  to  be  published) .  Their  conclu¬ 
sions  are  similar  to  those  above  with  explicit  further 
statements  concerning  the  independent  effects  of 
excitatory  and  inhibitory  neurons  on  selectivity  and 
ocular  dominance.  For  example,  shutting  off  inhibitory 
cells  lessens  selectivity  and  alters  ocular  dominance 
giving  'masked  synapse'  effects. 

Quantitative  tests  of  progressions  such  as  those  shown 
in  Figure  17  are  in  progress  in  our  laboratory.  We 
hope  that  such  experiments  can  provide  detailed  compari¬ 
sons  with  theory  and  provide  us  with  a  sensitive  tool  for 
determining  synaptic  modification  among  various  classes 
of  neurons — a  possible  entry  to  the  process  by  which 
the  nervous  system  organizes  itself. 
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